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addition, the extended cDNAs may contain the foil coding sequence of the gene from which the EST 
was derived or, alternatively, the extended cDNAs may include portions of the coding sequence of 
the gene from which the EST was derived. It will be appreciated that there may be several extended 
cDNAs which include the EST sequence as a result of alternate splicing or the activity of alternative 
promoters. Alternatively, ESTs having partially overlapping sequences may be identified and 
contigs comprising the consensus sequences of the overlapping ESTs may be identified. 

In the past, these short EST sequences were often obtained from oligo-dT primed cDNA 
libraries. Accordingly, they mainly corresponded to the 3' untranslated region of the mRNA. In 
part, the prevalence of EST sequences derived from the 3' end of the mRNA is a result of the fact 
that typical techniques for obtaining cDNAs, are not well suited for isolating cDNA sequences 
derived from the 5' ends of mRNAs (Adams et al.. Nature 377:3-174, 1996, Hillier et al. Genome 
Res. 6:807-828, 1996), the entire disclosures of which are incorporated herein by reference. 

In addition, in those reported instances where longer cDNA sequences have been obtained, 
the reported sequences typically correspond to coding sequences and do not include the full 5' 
untranslated region (5'UTR) of the mRNA from which the cDNA is derived. Indeed, 5'UTRs have 
been shown to affect either the stability or translation of mRNAs. Thus, regulation of gene 
expression may be achieved through the use of alternative 5'UTRs as shown, for instance, for the 
translation of the tissue inhibitor of metalloprotease mRNA in mitogenically activated cells 
(Waterhouse et al. J Biol Chem. 265:5585-9. 1990) , the entire disclosure of which is incorporated 
herein by reference. Furthermore, modification of 5'UTR through mutation, insertion or 
translocation events may even be implied in pathogenesis. For instance, the fragile X syndrome, 
the most common cause of inherited mental retardation, is partly due to an insertion of multiple 
CGG trinucleotides in the 5'UTR of the fragile X mRNA resulting in the inhibition of protein 
synthesis via ribosome stalling (Feng et al, Science 268:731-4, 1995) , the entire disclosure of 
which is incorporated herein by reference. An aberrant mutation in regions of the 5'UTR known to 
inhibit translation of the proto-oncogene c-myc was shown to result in upregulation of c-myc 
protein levels in cells derived from patients with multiple myelomas (Willis et al, Curr Top 
Microbiol Immunol 224:269-76, 1997) , the entire disclosure of which is incorporated herein by 
reference. In addition, the use of oligo-dT primed cDNA libraries does not allow the isolation of 
complete 5'UTRs since such incomplete sequences obtained by this process may not include the first 
exon of the mRNA, particularly in situations where the first exon is short. Furthermore, they may 
not include some exons, often short ones, which are located upstream of splicing sites. Thus, there is 
a need to obtain sequences derived from the 5' ends of mRNAs. 
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undesirable phenotype as a result of a mutation in such a coding sequence, the undesirable phenotype 
may be corrected by introducing a normal coding sequence using gene therapy. Alternatively, if the 
undesirable phenotype results from overexpression of the protein encoded by the coding sequence, 
expression of the protein may be reduced using antisense or triple helix based strategies. 

The secreted or non-secreted human polypeptides encoded by the coding sequences may 
also be used as therapeutics by administering them directly to an individual having a condition, such 
as a disease, resulting from a mutation in the sequence encoding the polypeptide. In such an 
instance, the condition can be cured or ameliorated by administering the polypeptide to the 
individual. 

In addition, the secreted or non-secreted human polypeptides or portions thereof may be 
used to generate antibodies useful in determining the tissue type or species of origin of a biological 
sample. The antibodies may also be used to determine the cellular localization of the secreted or 
non-secreted human polypeptides or the cellular localization of polypeptides which have been fused 
to the human polypeptides. In addition, the antibodies may also be used in immunoaffmity 
chromatography techniques to isolate, purify, or enrich the human polypeptide or a target 
polypeptide which has been fused to the human polypeptide. 

Public information on the number of human genes for which the promoters and upstream 
regulatory regions have been identified and characterized is quite limited. In part, this may be due to 
the difficulty of isolating such regulatory sequences. Upstream regulatory sequences such as 
transcription factor binding sites are typically too short to be utilized as probes for isolating 
promoters from human genomic libraries. Recently, some approaches have been developed to 
isolate human promoters. One of them consists of making a CpG island library (Cross et ai, Nature 
Genetics 6: 236-244, 1994) , the entire disclosure of which is incorporated herein by reference. The 
second consists of isolating human genomic DNA sequences containing Spel binding sites by the use 
of Spel binding protein. (Mortlock et ai, Genome Res. 6:327-335, 1996) , the entire disclosure of 
which is incorporated herein by reference. Both of these approaches have their limits due to a lack of 
specificity and of comprehensiveness. Thus, there exists a need to identify and systematically 
characterize the 5 ? portions of the genes. 

The present 5' ESTs may be used to efficiently identify and isolate 5'UTRs and upstream 
regulatory regions which control the location, developmental stage, rate, and quantity of protein 
synthesis, as well as the stability of the mRNA. Once identified and characterized, these regulatory 
regions may be utilized in gene therapy or protein purification schemes to obtain the desired amount 
and locations of protein synthesis or to inhibit, reduce, or prevent the synthesis of undesirable gene 
products. 
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Preferably, the enriched y ESTs represent 15% or more of the number of nucleic acid inserts in the 
population of recombinant backbone molecules. More preferably, the enriched 5' ESTs represent 
50% or more of the number of nucleic acid inserts in the population of recombinant backbone 
molecules. In a highly preferred embodiment, the enriched 5 ? ESTs represent 90% or more of the 
number of nucleic acid inserts in the population of recombinant backbone molecules. 

"Stringent," ''moderate/' and "low" hybridization conditions are as defined below. 
The term "polypeptide" refers to a polymer of amino acids without regard to the length of 
the polymer; thus, peptides, oligopeptides, and proteins are included within the definition of 
polypeptide. This term also does not specify or exclude post-expression modifications of 
polypeptides, for example, polypeptides which include the covalent attachment of glycosyl 
groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by 
the term polypeptide. Also included within the definition are polypeptides which contain one or 
more analogs of an amino acid (including, for example, non-naturally occurring amino acids, 
amino acids which only occur naturally in an unrelated biological system, modified amino acids 
from mammalian systems etc.), polypeptides with substituted linkages, as well as other 
modifications known in the art, both naturally occurring and non-naturally occurring. 

As used interchangeably herein, the terms "nucleic acids," "oligonucleotides," and 
"'polynucleotides" include RNA. DNA, or RNA/DNA hybrid sequences of more than one 
nucleotide in either single chain or duplex form. The term "nucleotide" as used herein as an 
adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any 
length in single-stranded or duplex form. The term "nucleotide" is also used herein as a noun to 
refer to individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit 
in a larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose 
sugar moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within 
an oligonucleotide or polynucleotide. Although the term '"nucleotide" is also used herein to 
encompass "modified nucleotides" which comprise at least one modifications (a) an alternative 
linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, or (d) an 
analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see 
for example PCT publication No. WO 95/04064. The polynucleotide sequences of the invention 
may be prepared by any known method, including synthetic, recombinant, ex vivo generation, or a 
combination thereof, as well as utilizing any purification methods known in the art. 

The terms i; base paired" and "Watson & Crick base paired" are used interchangeably 
herein to refer to nucleotides which can be hydrogen bonded to one another be virtue of their 
sequence identities in a manner like that found in double-helical DNA with thymine or uracil 

S: v SH-R£SP\GEN\TI2tCl\Spec (clean-nov200Urtf.doc 



9 Docket No.: GEN-T121C1 

Serial No. 09/471,276 

be referred to hereinafter as ^full-length cDNAs." These cDNAs may comprise a 3' untranslated 
region and eventually a polyadenylation tail. These cDNAs may also include DNA derived from 
mRNA sequences upstream of the translation start site. The full-length cDNA sequences may be 
used to express the proteins corresponding to the 5 9 ESTs. As discussed above, secreted proteins and 
non-secreted proteins may be therapeutically important. Thus, the proteins expressed from the 
cDNAs may be useful in treating and controlling a variety of human conditions. The 5 ? ESTs may 
also be used to obtain the corresponding genomic DNA. The term "corresponding genomic DNA" 
refers to the genomic DNA which encodes the mRNA from which the 5' EST was derived. 

Alternatively, the 5 ? ESTs may be used to obtain and express extended cDNAs encoding 
portions of the protein. In the case of secreted proteins, the portions may comprise the signal 
peptides of the secreted proteins or the mature proteins generated when the signal peptide is cleaved 
off. 

The present invention includes isolated, purified, or enriched "EST-related nucleic acids." 
The terms "isolated," "purified" or "enriched" have the meanings provided above. As used herein, 
the term "EST-related nucleic acids" means the nucleic acids of SEQ ID NOs. 24-811 and 1600- 
1622, extended cDNAs obtainable using the nucleic acids of SEQ ID NOs. 24-81 1 and 1600-1622, 
full-length cDNAs obtainable using the nucleic acids of SEQ ID NOs. 24-811 and 1600-1622 or 
genomic DNAs obtainable using the nucleic acids of SEQ ID NOs. 24-811 and 1600-1622. The 
present invention also includes the sequences complementary to the EST-related nucleic acids. 

The present invention also includes isolated, purified, or enriched "fragments of EST-related 
nucleic acids." The terms "isolated," "purified" and "enriched" have the meanings described above. 
As used herein the term "fragments of EST-related nucleic acids" means fragments comprising at 
least 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive 
nucleotides of the EST-related nucleic acids to the extent that fragments of these lengths are 
consistent with the lengths of the particular EST-related nucleic acids being referenced. In particular, 
fragments of EST-related nucleic acids refer to "polynucleotides described in Table II," 
"polynucleotides described in Table III," and "polynucleotides described in Table IV." The present 
invention also includes the sequences complementary to the fragments of the EST-related nucleic 
acids. 

The present invention also includes isolated, purified, or enriched "positional segments of 
EST-related nucleic acids." As used herein, the term "positional segments of EST-related nucleic 
acids" includes segments comprising nucleotides 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 
151-175, 176-200, 201-225, 226-250, 251-300, 301-325, 326-350, 351-375, 376^00, 401 -425, 426- 
450, 45M75, 476-500, 501-525, 526-550, 551-575, 576-600 and 601 -the terminal nucleotide of the 
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EST-related nucleic acids to the extent that such nucleotide positions are consistent with the lengths 
of the particular EST-related nucleic acids being referenced. The term ''positional segments of EST- 
related nucleic acids also includes segments comprising nucleotides 1-50, 51-100, 101-150, 151-200, 
201-250, 251-300, 301-350, 351-400, 401-450, 450-500, 501-550, 551-600 or 601 -the terminal 
nucleotide of the EST-related nucleic acids to the extent that such nucleotide positions are consistent 
with the lengths of the particular EST-related nucleic acids being referenced. The term "positional 
segments of EST-related nucleic acids" also includes segments comprising nucleotides 1-100, 101- 
200, 201-300, 301-400, 501-500, 500-600, or 601 -the terminal nucleotide of the EST-related nucleic 
acids to the extent that such nucleotide positions are consistent with the lengths of the particular 
EST-related nucleic acids being referenced. In addition, the term "positional segments of EST- 
related nucleic acids" includes segments comprising nucleotides 1-200, 201-400, 400-600, or 601 -the 
terminal nucleotide of the EST-related nucleic acids to the extent that such nucleotide positions are 
consistent with the lengths of the particular EST-related nucleic acids being referenced. The present 
invention also includes the sequences complementary to the positional segments of EST-related 
nucleic acids. 

The present invention also includes isolated, purified, or enriched "fragments of positional 
segments of EST-reiated nucleic acids." As used herein, the term "fragments of positional segments 
of EST-related nucleic acids" refers to fragments comprising at least 10, 15, 18, 20, 23, 25, 28, 30, 
35, 40, 50, 75, 100, 150, or 200 consecutive nucleotides of the positional segments of EST-related 
nucleic acids. The present invention also includes the sequences complementary to the fragments of 
positional segments of EST-related nucleic acids. 

The present invention also includes isolated or purified "EST-related polypeptides." As used 
herein, the term "EST-related polypeptides" means the polypeptides encoded by the EST-related 
nucleic acids, including the polypeptides of SEQ ED NOs. 812-1599. 

The present invention also includes isolated or purified "fragments of EST-related 
polypeptides." As used herein, the term "fragments of EST-related polypeptides" means fragments 
comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of an 
EST-related polypeptide to the extent that fragments of these lengths are consistent with the lengths 
of the particular EST-related polypeptides being referenced. In particular, fragments of EST-related 
polypepides refer to polypeptides encoded by "polynucleotides described in Table II" 
"polynucleotides described in Table III," and "polynucleotides described in Table IV." 

The present invention also includes isolated or purified "positional segments of EST-related 
polypeptides." As used herein, the term "positional segments of EST-related polypeptides" includes 
polypeptides comprising amino acid residues 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 151- 
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Another embodiment of the present invention is a purified or isolated polypeptide 
comprising, consisting essentially of, or consisting of a sequence selected from the group 
consisting of SEQ ID NOs. 1554-1580. 

Another embodiment of the present invention is a purified or isolated polypeptide 
comprising, consisting essentially of, or consisting of a mature protein of a polypeptide selected 
from the group consisting of SEQ ID NOs. 1554-1580. 

Another embodiment of the present invention is a purified or isolated polypeptide 
comprising, consisting essentially of, or consisting of a signal peptide of a sequence selected from 
the group consisting of the polypeptides of SEQ ID NOs. 812-1516 and 1554-1580. 

Another embodiment of the present invention is a purified or isolated polypeptide 
comprising, consisting essentially of or consisting of at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 
50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these 
lengths are consistent with the specific sequence, of a sequence selected from the group consisting 
of the sequences of SEQ ID NOs. 812-1599. 

Another embodiment of the present invention is a method of making a cDNA comprising 
the steps of contacting a collection of mRNA molecules from human cells with a primer 
comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a 
sequence selected from the group consisting of the sequences complementary to SEQ ED NOs. 
24-81 1 and SEQ ID NOs. 1600-1622, hybridizing said primer to an mRNA in said collection that 
encodes said protein reverse transcribing said hybridized primer to make a first cDNA strand 
from said mRNA, making a second cDNA strand complementary to said first cDNA strand and 
isolating the resulting cDNA encoding said protein comprising said first cDNA strand and said 
second cDNA strand. 

Another embodiment of the present invention is a purified cDNA obtainable by the 
method of the preceding paragraph. 

In one aspect of this embodiment, the cDNA encodes at least a portion of a human 
polypeptide. Preferably, said human polypeptide comprises at least 8, 10, 12, 15, 18, 20, 23, 25, 
28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that 
fragments of these lengths are consistent with the specific sequence, of a sequence encoded by a 
sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. More 
preferably, said human polypeptide comprises the polypeptide encoded by a sequence selected 
from the group consisting of the sequences of SEQ ID NOs. 24-811. In one aspect of this 
embodiment, said cDNA comprises the complete coding sequence of said human polypeptide. 
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the oligonucleotide tag to the mRNA, the integrity of the mRNA was then examined by performing a 
Northern blot using a probe complementary to the oligonucleotide tag. 

EXAMPLE 3 

cDNA Synthesis Using mRNA Templates Having Intact 5 ? Ends 
For the mRNAs joined to oligonucleotide tags, first strand cDNA synthesis was performed 
using a reverse transcriptase with random nonamers as primers. In order to protect internal EcoRI 
sites in the cDNA from digestion at later steps in the procedure, methylated dCTP was used for first 
strand synthesis. After removal of mRNA by an alkaline hydrolysis, the first strand of cDNA was 
precipitated using isopropanol in order to eliminate residual primers. 

The second strand of the cDNA was synthesized with a Klenow fragment using a primer 
corresponding to the 5 ? end of the ligated oligonucleotide. Methylated dCTP was also used for 
second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion dunng 
the cloning process. 

EXAMPLE 4 

Cloning of cDNAs derived from mRNA with intact 5 ? ends into BlueScript 
Following second strand synthesis, the ends of the cDNA were blunted with T4 DNA 
polymerase (Biolabs) and the cDNA was digested with EcoRI. Since methylated dCTP was used 
during cDNA synthesis, the EcoRI site present in the tag was the only hemi-methylated site, hence 
the only site susceptible to EcoRI digestion. The cDNA was then size fractionated using exclusion 
chromatography (Ac A, Biosepra) and fractions corresponding to cDNAs of more than 150 bp were 
pooled and ethanol precipitated. The cDNA was directionally cloned into the Smal and EcoRI ends 
of the phagemid pBlueScript vector (Stratagene). The ligation mixture was electroporated into 
bacteria and propagated under appropriate antibiotic selection. 

EXAMPLE 5 

Selection of Clones Having the Oligonucleotide Tag Attached Thereto 
Clones containing the oligonucleotide tag attached were then selected as follows. The 
plasmid DNAs containing 5' EST libraries made as described above were purified (Qiagen). A 
positive selection of the tagged clones was performed as follows. Briefly, in this selection procedure, 
the plasmid DNA was converted to single stranded DNA using gene II endonuclease of the phage Fl 
in combination with an exonuclease (Chang et al., Gene 127:95-8, 1993) , the entire disclosure of 
which is incorporated herein by reference, such as exonuclease HI or T7 gene 6 exonuclease. The 
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resulting single stranded DNA was then purified using paramagnetic beads as described by Fry et ai, 
Biotechniques, 13: 124-131, 1992, the entire disclosure of which is incorporated herein by reference. 
In this procedure, the single stranded DNA was hybridized with a biotinylated oligonucleotide 
having a sequence corresponding to the 3' end of the oligonucleotide tag. Clones including a 
sequence complementary to the biotinylated oligonucleotide were captured by incubation with 
streptavidin coated magnetic beads followed by magnetic selection. After capture of the positive 
clones, the plasmid DNA was released from the magnetic beads and converted into double stranded 
DNA using a DNA polymerase such as the Thermosequenase obtained from Amersham Pharmacia 
Biotech. The double stranded DNA was then electroporated into bacteria. The percentage of 
positive clones having the 5' tag oligonucleotide was estimated using dot blot analysis to typically be 
between 90 and 98%. 

Following electroporation, the libraries were ordered in 384-microtiter plates (MTP). A 
copy of the MTP was stored for future needs. Then the libraries were transferred into 96 MTP and 
sequenced as described below. 



EXAMPLE 6 

Sequencing of Inserts in Selected Clones 
Plasmid inserts were first amplified by PCR on PE-9600 thermocyclers (Perkin-Elmer, 
Applied Biosystems Division, Foster City, CA), using standard SETA-A and SETA-B primers 
(Genset SA), AmpliTaqGold (Perkin-Elmer), dNTPs (Boehringer), buffer and cycling conditions as 
recommended by the Perkin-Elmer Corporation. 

PCR products were then sequenced using automatic ABI Prism 377 sequencers (Perkin 
Elmer). Sequencing reactions were performed using PE 9600 thermocyclers with standard dye- 
primer chemistry and ThermoSequenase (Amersham Pharmacia Biotech). The primers used were 
either T7 or 2 1M 13 (available from Genset SA) as appropriate. The primers were labeled with the 
JOE, FAM, ROX and TAMRA dyes. The dNTPs and ddNTPs used in the sequencing reactions 
were purchased from Boehringer. Sequencing buffer, reagent concentrations and cycling conditions 
were as recommended by Amersham. 

Following the sequencing reaction, the samples were precipitated with ethanol, resuspended 
in formamide loading buffer, and loaded on a standard 4% acrylamide gel. Electrophoresis was 
performed for 2.5 hours at 3000V on an ABI 377 sequencer, and the sequence data were collected 
and analyzed using the ABI Prism DNA Sequencing Analysis Software, version 2.1.2. 
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EXAMPLE 7 

Obtaining 5' ESTs from Extended cDNA libraries 
Obtained from mRNA with Intact 5' Ends 
Alternatively, 5 ? ESTs may be isolated from other cDNA or genomic DNA libraries. Such 
cDNA or genomic DNA libraries may be obtained from a commercial source or made using other 
techniques familiar to those skilled in the art. One example of such cDNA library construction, a 
full-length cDNA library, is as follows. 

PolyA+ RNAs are prepared and their quality checked as described in Example 1. Then, the 
caps at the 5 ? ends of the poIyA+ RNAs are specifically joined to an oligonucleotide tag as described 
in Example 2. The oligonucleotide tag may contain a restriction site such as Eco RI to facilitate 
further subcloning procedures. Northern blotting is then performed to check the size of mRNAs 
having the oligonucleotide tag attached thereto and to ensure that the mRNAs are actually tagged. 

First strand synthesis is subsequently carried out for mRNAs joined to the oligonucleotide 
tag as described in Example 3 above except that the random nonamers are replaced by an oligo-dT 
primer. For instance, this oligo-dT primer may contain an internal tag of 4 nucleotides which is 
different from one tissue to the other. Following second strand synthesis using a primer contained in 
the oligonucleotide tag attached to the 5 ? end of mRNA, the blunt ends of the obtained double 
stranded full-length DNAs are modified into cohesive ends to facilitate subcloning. For example, 
the extremities of full-length cDNAs may be modified to allow subcloning into the Eco RI and Hind 
III sites of a Bluescript vector using the Eco RI site of the oligonucleotide tag and the addition of a 
Hind III adaptor to the 3" end of full-length cDNAs. 

The full-length cDNAs are then separated into several fractions according to their sizes 
using techniques familiar to those skilled in the art. For example, electrophoretic separation may be 
applied in order to yield 3 or 6 different fractions. Following gel extraction and purification, the 
cDNA fractions are subcloned into appropriate vectors, such as Bluescript vectors, transformed into 
competent bacteria and propagated under appropriate antibiotic conditions. Subsequently, plasmids 
containing tagged full-length cDNAs are positively selected as described in Example 5. 

The 5' end of full-length cDNAs isolated from such cDNA libraries may then be sequenced 
as described in Example 6 to yield 5 'ESTs. 

II. Computer Analysis of the Isolated 5* ESTs: Construction of the SignalTag™ Database 

The sequence data from the cDNA libraries made as described above were transferred to a 
database, where quality control and validation steps were performed. A base-caller, working using a 
Unix system, automatically flagged suspect peaks, taking into account the shape of the peaks, the 
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the ORF encoded by the extended cDNA is basically the same as the one encoded by the consensus 
contigated 5 'EST or 5 'EST. 

Alternatively, to confirm that the chosen ORF actually encodes a polypeptide, the consensus 
contigated 5'EST or 5'EST may be used to obtain an extended cDNA using any of the techniques 
described therein, and especially those described in Examples 19 and 20. Such an extended cDNA 
may then be inserted into an appropriate expression vector and used to express the polypeptide 
encoded by the extended cDNA as described therein. The expressed polypeptide may be isolated, 
purified, or enriched as described therein. Several methods known to those skilled in the art may 
then be used to determine whether the expressed polypeptide is the one actually encoded by the 
chosen ORF, therein referred to an the expected polypeptide. Such methods are based on the 
determination of predictable features of the expressed polypeptide, including but not limited to its 
amino acid sequence, its size or its charge, and the comparison of these features to those predicted for 
the expected polypeptide. The following paragraphs present examples of such methods. 

One of these methods consists in the determination of at least a portion of the amino acid 
sequence of the expressed polypeptide using any technique known to those skilled in the art. For 
example, the amino-terminal residues may be determined using techniques either based on Sanger's 
technique of acid hydrolysis of a polypeptide which N-terminal residue has been covalently labeled 
or using techniques based on Edman degradation of polypeptides which N-terminal residues are 
sequentially labeled and cleaved from the polypeptide of interest. The amino acid sequence of the 
expressed polypeptide may then be compared to the one predicted for the expected polypeptide using 
any algorithm and parameters described therein. 

Alternatively, the size of the expressed polypeptides may be determined using techniques 
familiar to those skilled in the art such as Coomassie blue or silver staining and subsequently 
compared to the size predicted for the expected polypeptide. Generally, the band corresponding to 
the expressed polypeptide will have a mobility near that expected based on the number of amino 
acids in the open reading frame of the extended cDNA. However, the band may have a mobility 
different than that expected as a result of modifications such as glycosylation, ubiquitination, or 
enzymatic cleavage. 

Alternatively, specific antibodies or antipeptides may be generated against the expected 
polypeptide as described in Example 34 and used to perform immunoblotting or 
immunoprecipitation studies against the expressed polypeptide. The presence of a band in samples 
from cells containing the expression vector with the extended cDNA which is absent in samples from 
cells containing the expression vector encoding an irrelevant polypeptide indicates that the expected 
polypeptide or portion thereof is being expressed. Generally, the band corresponding to the 
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SEQ ID NOs. 24-728 are nucleic acids having an incomplete ORF which encodes a signal 
peptide. The locations of the incomplete ORFs and sequences encoding signal peptides are listed in 
the accompanying Sequence Listing. In addition, the von Heijne score of the signal peptide 
computed as described in Example 13 is listed as the "score" in the accompanying Sequence Listing. 
The sequence of the signal-peptide is listed as "seq" in the accompanying Sequence Listing. The "/" 
in the signal peptide sequence indicates the location where proteolytic cleavage of the signal peptide 
occurs to generate a mature protein. 

SEQ ED NOs. 729-765 are nucleic acids having an incomplete ORF in which no sequence 
encoding a signal peptide has been identified to date. However, it remains possible that subsequent 
analysis will identify a sequence encoding a signal peptide in these nucleic acids. The locations of the 
incomplete ORFs are listed in the accompanying Sequence Listing. 

SEQ ID NOs. 766-792 are nucleic acids having a complete ORF which encodes a signal 
peptide. The locations of the complete ORFs and of the signal peptides, the von Heijne score of the 
signal peptide, the sequence of the signal-peptide and the proteolytic cleavage site are indicated as 
described above. 

SEQ ID NOs. 793-81 1 are nucleic acids having a complete ORF in which no sequence 
encoding a signal peptide has been identified to date. However, it remains possible that subsequent 
analysis will identify a sequence encoding a signal peptide in these nucleic acids. The locations of the 
complete ORFs are listed in the accompanying Sequence Listing. 

SEQ ID NOs. 812-1516 are "incomplete polypeptide sequences" which include a signal 
peptide. "Incomplete polypeptide sequences" are polypeptide sequences encoded by nucleic acids in 
which a start codon has been identified but no stop codon has been identified. These polypeptides 
are encoded by the nucleic acids of SEQ ID NOs. 24-728. The location of the signal peptide, the von 
Heijne score of the signal peptide, the sequence of the signal-peptide and the proteolytic cleavage site 
are indicated as described above. 

SEQ ID NOs. 1517-1553 are incomplete polypeptide sequences in which no signal peptide 
has been identified to date. However, it remains possible that subsequent analysis will identify a 
signal peptide in these polypeptides. These polypeptides are encoded by the nucleic acids of SEQ ID 
NOs. 729-765. 

SEQ ID NOs. 1554-1580 are "complete polypeptide sequences" which include a signal 
peptide. "Complete polypeptide sequences" are polypeptide sequences encoded by nucleic acids in 
which a start codon and a stop codon have been identified. These polypeptides are encoded by the 
nucleic acids of SEQ ID NOs. 766-792. The location of the signal peptide, the von Heijne score of 



S:\SH-RESP\GEN\T121Cl\Spec (clean-nov200I)rtf.doc 



37 Docket No.: GEN-T121C1 

Serial No. 09/471,276 

fragments are then defined by a range of nucleotide positions from the SEQ IDs of the consensus 
contigated 5 ? ESTs as indicated in the second column entitled ''positions of preferred fragments." 
The preferred polynucleotide fragments correspond to the individual 5 ? ESTs aligned to obtain the 
consensus contigated 5 ? EST and to those filed in the priority documents. The third column entitled 
'Variant nucleotides" describes the nucleotide sequence variations observed between the consensus 
contigated 5 ? EST and preferred nucleic acid fragments as follows: 

A) Substitutions in the sequence of a consensus contigated 5'EST to derive a 
preferred polynucleotide fragment are denoted by an "S" followed by a number indicating 
the first nucleotide position in a specific SEQ ID to be substituted in a string of substituted 
nucleotides or the position of the substituted nucleotide in the case of a single substituted 
nucleotide. Then there is a coma followed by one or more lower case letters indicating the 
identity of the nucleotide(s) occurring in the substituted position(s). For example, SEQ ID 
NO: 3401; Position of preferred fragments: 1-250; Variant nucleotides S45,atc would 
indicate that a preferred polynucleotide fragment had the sequence of positions 1 to 250 of 
SEQ ED NO. 3401, except that the nucleotides at positions 45, 46, and 47 were substituted 
with A, T, and C, respectively, in the preferred polynucleotide as compared with the 
sequence of SEQ ED No. 3401. 

B) Insertions in the sequence of a consensus contigated 5'EST to derive a preferred 
polynucleotide fragment are denoted by an ; T', followed by a number indicating the 
nucleotide position in a specific SEQ ID after which a string of nucleotides is inserted or the 
position after which the nucleotide is inserted in the case of a single inserted nucleotide. 
Then there is a coma followed by one or more lower case letters indicating the identity of the 
nucleotide(s) occurring in the inserted position(s). For example, SEQ ID NO: 7934; Position 
of preferred fragments: 1-500; Variant nucleotides: I36,gataca would indicate that a 
preferred polynucleotide fragment had the sequence of positions 1 to 500 of SEQ ID NO. 
7934, except that after the nucleotides at position 36 a GATACA string of nucleotides is 
inserted in the preferred polynucleotide as compared with the sequence of SEQ ED No. 7934. 

C) Deletions in the sequence of a consensus contigated 5'EST to derive a preferred 
nucleic acid fragment are denoted by an "D" followed by a number indicating the first 
nucleotide position in a specific SEQ ID to be deleted in a string of deleted nucleotides or 
the position of the deleted nucleotide in the case of a single deleted nucleotide. Then there is 
a coma followed by number indicating the number of nucleotide(s) deleted from the 
sequence provided in the sequence ID. For example, SEQ ED NO: 5398; Position of 
preferred fragments: 56-780; Variant nucleotides D114,5 would indicate that a preferred 
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SEQ ID NO 



194 



213 



223 



247 



258 



264 



269 



286 



287 



289 



289 



292 



293 



300 



349 



350 



368 



385 



411 



412 



415 



435 



436 



437 



441 



454 



455 



459 



460 



461 



481 



489 



496 



501 



502 



Positions of 
Preferred 
Fragmen ts 



1-215 



1-158 



3-431 



1-359 



1-236 



5-283 



1-143 



5-207 



1-277 



69-416 



1-278 



20-254 



1-414 



1-285 



23-431 



Variant nucleotides 



S50, s; SI 86. sn: SI 99, k; 1215, gcagcggg 



S12S, m; 1132, w; S143, d: 1158, tgcccggg 



Dl, 2; S28, s; S79, c; S82, s; S308, nr; S328, 
nb; 1431, ccggc 



176, gttt; 1359, tccctsg 



S72, r; S81, g; SI 97, s; 1205, ss; S232, k; 1236, 

acttCggg 



Dl, 4; S64, g; S122, m; S134, yy; 1137, c; 
151, t; I283,gttgc 



SI 1 1, s: 1143, ggggcggg 



Dl. 4: S204, a; S206, c; 1207, gg; D208, 567 



SI 14, r; 1125, t; SI 31, ag; S256, tg; S259, tt; 
S262, at; S267, t: S269, c; S273, c; 1277, 
ccggg; D278, 337 



J\, 68: 1416. agccaggg 



SI 14, r; 1125. t; S131, ae; S277, c; 1278, cggg; 
D279, 138 



31, 19: 1254, aaagagg 



414, tagcag 



SI 6. m: S67. y: 1285. baccacggg; D286, 1 



3-386 



Dl. 22: II 18. a: S214, y: 1431, caactgg 



3-446 



1-193 



6-391 



1-185 



2-229 



1-386 



Dl. 2: S42. w: 1263, c: 1386. gggat 



D1.2; 1446. tctct 



135, t; ri08, t; 1134, r; S135, a; S137, r; S143, 
w: 1178, c: 1193. gagcgggg 



Dl. 5: SI 7. r: S27. t; S334, y; D392, 244 



S49. s: SI 27, s: 1185. gctggg; Dl 86, 150 



Dl, 1; S3, a: 1229. caaatggg 



4-472 



1-340 



1-409 



1-492 



1-177 



S4. s; 1386, ccggg 



Dl, 3; S61, sa; D238, 1; S239, s; 1472, agtgtgg 



1340. ggg; D341, 129 



SI 09, smag; 1409, cgcacggg 



S72, nn; SI 15, t; S121, bwy; S181, yn; 1492, 
gagtc 



1-311 



1-425 



5-420 



1-429 



1-414 



1-215 



1-430 



91-413 



114, w; 116, a; 1177, gagctggg 



S39, n; S74, rg; 13 1 1 , accatggg 



1425, agtac 



D1,4;I420, tcgtc 



IIP, w; S262, d; S333, n; 1429, ctccaggg 



D72, 1; SI 17, n; S396, d;I414, ggaca 



1215, ttttcggg 



S275, n; 1430, aggat 



Dl, 90; 1413, aaacgggg 
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SEQ ID NO. 



797 



798 



799 



800 



801 



802 



803 



804 



805 



806 



807 



808 



809 



Positions of 
Preferred 
Fragments 



1-420 



25-316 



1-344 



7-465 



121-422 



Variant nucleotides 



S136 ; c: SI 50. c: 1245, ccc; 1420, ggagtg 



D1.24: S315,g; D317. 97 



D345. 57 



D 1 . 6; S59. k; S 146. a; S 1 86. km; 1465, gttca 



46-477 



15-467 



1-341 



Dl. 120; 1269, c; S419. cc: 1422, gg; D423 
207 



D1.45: SI 32, bn; 1477. actac 



Dl, 14; S45, k; S65, t: S418. ys; D452, 1; 
D468, 119 



2-409 



5-384 



1-301 



2-314 



S42, t; S97, d; S326, gtg; S33 1, tgt; S336, a; 
S338, c: 1341, cccccggg; D342. 218 



Dl, 1: S334, d; 1409. aggg; D410, 161 



D1.4; 1384. actaa 



SI 13, a: SI 17, c; S123, t; D128, 1;D134, 1; 
S282, g: S284. a; 1301, gacggagggg; D302, 70 



1-394 



IDI, 1: S306, g; 1314, ggg; D315, 121 



S53, g; S228, n; S272, vk; 1301, g; 1358, m; 
S368, nb; S375, w; 1383, mm; 1388, yt; 1394, 
nhaccaag 



810 


6-205 


10, a: Dl. 5; 1141. t: 1205, ggg; D206. 630 


811 


6-270 


Dl. 5: 1270. gggg; D271, 115 


1600 


1-247 


S45. m; SI 14, k; 1122. m; S 123, yc; S158, rr; 
S221. k: 1247, ccccaggg 


1601 


1-225 


SI 09. bm: SI 95, m: 1225, tgcacggg 


1602 


23-245 


Dl, 22: D138, 1: S139. s: S242, t; S244, g; 
1245. g: D246. 13 j 


1603 


1-303 


S71. c: D277, 1; 1303. agaggga; D304, 38 


1604 


1-242 


S47, vv; S50, c; S81, h: S85, d; S91, k: SI 06, r; 
1242, tgtggg; D243, 50 


1605 


2-225 


Dl, I: S20. k:S91,c; 1225. ggg; D226, 132 


1606 


15-293 


Dl, 14; S156, g; S193, g; 1200, t; 1293, 
acaaaggg | 


1607 


1-361 


S323, c; 1361, cccca 


1608 


1-151 


1151, taagggg; D152, 154 


1609 


1-242 


S55, s; 1135, a; SI 52, h; 1242, cagtaggg 


1610 


1-196 


1151, w; SI 90, k; 1196, cctgtgg 


1611 


1-228 


SI 15, k;S174,rk;I228, cgtttggg 


1612 


1-221 


SI 08, v; I221,tgatcggg 


1613 


1-281 


166, w;I137, a; D282, 79 


1614 


1-171 : 


S53, k; S76, k; 180, k; S81, kw; S86, r; S92, k; 
3126, k; 1171, gccgagg 


1615 


2-193 


31, 1; S67, c; 1121, s; S122, mm; S126, g; 
5130, r; S146, r; SI 56, gm; 1193, cctca 
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SEQ ID NO. 


Positions of 
Preferred 
Fragments 


Variant nucleotides 


1616 


1-349 


S251, ww; S259, rs; S275, k; 1279, w; S285, y; 
S292, y; 1320, m; 1331, m; 1338, w; 1341, s; 
1349, accccggg 


1617 


1-129 


1118, t;D130, 26 


1618 


1-184 


D9, 1; D185, 1 


1619 


1-169 


1122, t; 1169, gcccaggg 


1620 


1-187 


S106, k: SI 18, m; S122, eg; S132, k: D188, 59 


1621 


1-153 


D125, 1; 1131, ttt; S152. t; 1153, gg; D154, 127 


1622 


1-400 


S43, s; 1126, g; 1129, y: S353, d; 1400, tatat 



EXAMPLE 16 

Categorization of 5' ESTs and Consensus Contigated 5'ESTs 
The nucleic acid sequences of the present invention (SEQ ID NOs. 24-81 1 and 1600-1622) 
were grouped based on their homology to known sequences as follows. All sequences were 
compared to EMBL release 57 and daily releases available at the time of filing using BLASTN. All 
matches with a minimum of 25 nucleotides with 90% homology were retrieved and used to compute 
Tables HI and P/. 

In some embodiments, 5'ESTs or consensus contigated 5 ? ESTs nucleic acid sequence do not 
match any known vertebrate sequence nor any publicly available EST sequence, thus being 
completely new. 

In other embodiments, 5'ESTs or consensus contigated 5 ? ESTs match a known sequence. 
Tables III and IV gives for each sequence of the invention in this category referred to by its sequence 
identification number in the first column, the positions of their preferred fragments in the second 
column entitled ''Positions of preferred fragments." As used herein the term "polynucleotide 
described in Table HI" refers to the all of the preferred polynucleotide fragments defined in Table III 
in this manner, and the term "polynucleotide described in Table IV" refers to the all of the preferred 
polynucleotides fragments defined in Table IV in this manner. The present invention encompasses 
isolated, purified, or recombinant nucleic acids which consist of, consist essentially of, or comprise a 
contiguous span of at least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, or 500 
nucleotides in length, to the extent that a contiguous span of these lengths is consistent with the 
lengths of the particular polynucleotide, of a polynucleotide described in Table HI or Table IV, or a 
sequence complementary thereto, wherein said polynucleotide described in Table HI or Table IV is 
selected individually or in any combination from the polynucleotides described in Table m or Table 
IV. The present invention also encompasses isolated, purified, or recombinant nucleic acids which 
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Positions of Preferred 


SEQ ID NC 


> Fragments 


782 


1-59 


783 


1-53 


784 


1-220, 262-390 


785 


1-339,408-461 


786 


1-28 


789 


1-58 


791 


1-126 


792 


.1-31, 129-220 


793 


1-31 


794 


355-431 


795 


1-33 


797 


1-31 


798 


1-31 


799 


1-401 


801 


1-117 


802 


1-92 


806 


64-384 


807 


1-331 


808 


1-351 


810 


1-39 


1600 


1-25 


1603 


1-341 


1606 


1-31 


1607 


1-361 


1608 


164-305 


161 1 


85-228 


1612 


1-221 


1613 


112-360 


1614 


1-171 


1615 


94-193 


1617 


1-155 


1620 


1-246 



HI, Evaluation of Spatial and Temporal Expression of mRNAs Corresponding to the 5'ESTs, 
Consensus Contigated 5'ESTs, or EST-related nucleic acids 

5 

EXAMPLE 17 

Expression Patterns of mRNAs From Which the 5'ESTs were obtained 
Each of the SEQ ID NOs. 24-81 1 and 1600-1622 was also categorized based on the tissue 
from which its corresponding mRNA was obtained, as follows. 
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Table V shows the spatial distribution of each nucleic acid sequence of the invention (SEQ 
ID NOs. 24-81 1 and 1600-1622) referred to by its sequence identification number in the first column. 
In the second column entitled tissue distribution, the spatial distribution is represented by the number 
of individual 5'ESTs used to assemble the consensus contigated 5'ESTs for a given tissue. Each 
type of tissue listed in Table V is encoded by a letter. The correspondence between the letter code 
and the tissue type is given in Table VI. 



Table V 



SEQ ID WO 


Tissue Distribution 
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SEQ ID NO 1 Tissue Distribution 
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EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments EST-related 
nucleic acids, or fragments of positional segments of EST-related nucleic acids may be more than 
500 nucleotides long. 

For example, quantitative analysis of gene expression may be performed with EST-related 
nucleic acids, fragments of EST-related nucleic acids, positional segments EST-related nucleic acids, 
or fragments of positional segments of EST-related nucleic acids in a complementary DNA 
microarray as described by Schena et al. (Science 270:467-470, 1995; Proc. Natl. Acad. Sci. U.S.A. 
93:10614-10619, the entire disclosure of which is incorporated herein by reference, 1996). EST- 
related nucleic acids, fragments of EST-related nucleic acids, positional segments EST-related 
nucleic acids, or fragments of positional segments of EST-related nucleic acids are amplified by PCR 
and arrayed from 96-weil microtiter plates onto silylated microscope slides using high-speed 
roborics. Printed arrays are incubated in a humid chamber to allow rehydration of the array elements 
and nnsed, once in 0.2% SDS for 1 min, twice in water for 1 mm and once for 5 mm m sodium 
borohydnde solution. The arrays are submerged in water for 2 min at 95°C, transferred into 0.2% 
SDS for 1 min, rinsed twice with water, air dried and stored in the dark at 25°C. 

Cell or tissue mRNA is isolated or commercially obtained and probes are prepared by a 
single round of reverse transcription. Probes are hybridized to 1 cm 2 microarrays under a 14 x 14 
mm glass coverslip for 6-12 hours at 60°C. Arrays are washed for 5 min at 25°C in low stringency 
wash buffer (1 x SSC/0.2% SDS), then for 10 min at room temperature in high stringency wash 
buffer (0.1 x SSC/0.2% SDS). Arrays are scanned m 0.1 x SSC using a fluorescence laser scanning 
device fitted with a custom filter set. Accurate differential expression measurements are obtained by 
taking the average of the ratios of two independent hybridizations. 

Quantitative analysis of the expression of genes may also be performed with EST-related 
nucleic acids, fragments of EST-related nucleic acids, positional segments EST-related nucleic acids, 
or fragments of positional segments of EST-related nucleic acids in complementary DNA arrays as 
described by Pietu et al. {Genome Research 6:492-503, 1996) , the entire disclosure of which is 
incorporated herein by reference. The EST-related nucleic acids, fragments of EST-related nucleic 
acids, positional segments EST-related nucleic acids, or fragments of positional segments of EST- 
related nucleic acids thereof are PCR amplified and spotted on membranes. Then, mRNAs 
originating from various tissues or cells are labeled with radioactive nucleotides. After hybridization 
and washing in controlled conditions, the hybridized mRNAs are detected by phospho-imaging or 
autoradiography. Duplicate experiments are performed and a quantitative analysis of differentially 
expressed mRNAs is then performed. 
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Alternatively, expression analysis of the EST-related nucleic acids, fragments of EST-related 
nucleic acids, positional segments EST-related nucleic acids, or fragments of positional segments of 
EST-related nucleic acids can be done through high density nucleotide arrays as described by 
Lockhart et al {Nature Biotechnology 14: 1 675-1680, 1996) and Sosnowsky et al. (Proa Natl. Acad. 
Sci. 94:1119-1123, 1997) , the entire disclosures of which are incorporated herein by reference. 
Oligonucleotides of 15-50 nucleotides corresponding to sequences of EST-related nucleic acids, 
fragments of EST-related nucleic acids, positional segments EST-related nucleic acids, or fragments 
of positional segments of EST-related nucleic acids are synthesized directly on the chip (Lockhart et 
al, supra) or synthesized and then addressed to the chip (Sosnowsky et al, supra). Preferably, the 
oligonucleotides are about 20 to 25 nucleotides in length. 

cDNA probes labeled with an appropriate compound, such as biotin, digoxigenin or 
fluorescent dye, are synthesized from the appropriate mRNA population and then randomly 
fragmented to an average size of 50 to 100 nucleotides. The said probes are then hybridized to the 
chip. After washing as described in Lockhart et al, supra and application of different electric fields 
(Sonowsky et al, supra.), the dyes or labeling compounds are detected and quantified. Duplicate 
hybridizations are performed. Comparative analysis of the intensity of the signal originating from 
cDNA probes on the same target oligonucleotide in different cDNA samples indicates a differential 
expression of the mRNA corresponding to the 5' EST, consensus contigated 5' EST or extended 
cDNA from which the oligonucleotide sequence has been designed. 

IV. Use of 5' ESTs to Clone Extended cDNAs and to Clone the Corresponding Genomic DNAs 

Once 5 ? ESTs or consensus contigated 5 5 ESTs which include the 5 ? end of the 
corresponding mRNAs have been selected using the procedures described above, they can be utilized 
to isolate extended cDNAs which contain sequences adjacent to the 5' ESTs or consensus contigated 
5 ? ESTs. The extended cDNAs may include the entire coding sequence of the protein encoded by 
the corresponding mRNA, including the authentic translation start site. If the extended cDNA 
encodes a secreted protein, it may contain the signal sequence, and the sequence encoding the mature 
protein remaining after cleavage of the signal peptide. 

Extended cDNAs which include the entire coding sequence of the protein encoded by the 
corresponding mRNA are referred to herein as "full-length cDNAs." Alternatively, the extended 
cDNAs may not include the entire coding sequence of the protein encoded by the corresponding 
mRNA, although they do include sequences adjacent to the 5'ESTs or consensus contigated 5' ESTs. 
In some embodiments in which the extended cDNAs are derived from an mRNA encoding a secreted 
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After removal of the mRNA hybridized to the first cDNA strand by alkaline hydrolysis, the 
products of the alkaline hydrolysis and the residual poly dT primer can be eliminated with an 
exclusion column. 

Subsequently, a pair of nested primers on each end is designed based on the known 5' 
sequence from the 5' EST or consensus contigated 5' EST and the known 3 ? end added by the poly 
dT primer used in the first strand synthesis. Software used to design primers are either based on GC 
content and melting temperatures of oligonucleotides, such as OSP (Illier and Green, PCR Meth. 
Appl. 1:124-128, 1991), the entire disclosure of which is incorporated herein by reference, or based 
on the octamer frequency disparity method (Griffais et aL, Nucleic Acids Res. 19: 3887-3891, 1991 ), 
the entire disclosure of which is incorporated herein by reference such as PC -Rare (http:// 
bioinfoiTnatics.weizmann.acJVsofhvare/PC-Rare/doc/manuel.html). Preferably, the nested primers 
at the 5' end and the nested primers at the 3' end are separated from one another by four to nine 
bases. These primer sequences may be selected to have melting temperatures and specificities 
suitable for use in PCR. 

A first PCR run is performed using the outer primer from each of the nested pairs. A second 
PCR run using the inner primer from each of the nested pairs is then performed on a small sample of 
the first PCR product. Thereafter, the primers and remaining nucleotide monomers are removed. 
2. Sequencine Extended cDNAs or Fragments Thereof 

Due to the lack of position constraints on the design of 5 ? nested primers compatible for 
PCR use using the OSP software, amplicons of two types are obtained. Preferably, the second 5' 
primer is located upstream of the translation initiation codon thus yielding a nested PCR product 
containing the entire coding sequence. Such an extended cDNA may be used in a direct cloning 
procedure as described in section a below. However, in some cases, the second 5 ? primer is located 
downstream of the translation initiation codon, thereby yielding a PCR product containing only part 
of the ORF. Such incomplete PCR products are submitted to a modified procedure described in 
section b below. 

a) Nested PCR products containing complete ORFs 

When the resulting nested PCR product contains the complete coding sequence, as predicted 
from the 5'EST or consensus contigated 5' EST sequence, it is directly cloned in an appropriate 
vector as described in section 3. 

b) Nested PCR products containing incomplete ORFs 

When the amplicon does not contain the complete coding sequence, intermediate steps 
are necessary to obtain both the complete coding sequence and a PCR product containing the full 
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computer readable medium as described below and compared to one another using any of a variety 
of algorithms familiar to those skilled in the art, those described below. 

To determine the level of homology between the polypeptide encoded by the hybridizing 
cDNA or genomic DNA and the polypeptide encoded by the 5'EST, consensus contigated 5'EST or 
extended cDNA from which the probe was derived, the polypeptide sequence encoded by the 
hybridized nucleic acid and the polypeptide sequence encoded by the 5'EST, consensus contigated 
5'EST or extended cDNA from which the probe was derived are compared. The sequences of the 
polypeptide encoded by the 5'EST, consensus contigated 5'EST or extended cDNA from which the 
probe was derived and the polypeptide sequence encoded by the cDNA or genomic DNA which 
hybridized to the detectable probe may be stored on a computer readable medium as described below 
and compared to one another using any of a variety of algorithms familiar to those skilled in the art, 
those described below. 

Protein and/or nucleic acid sequence homologies may be evaluated using any of the 
variety of sequence comparison algorithms and programs known in the art. Such algorithms and 
programs include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and 
CLUSTALW (Pearson and Lipman, 1988, Proc. Natl. Acad. Sci. USA <?5fS):2444-2448; Altschul 
etal., 1990, J. Mol. Biol. 2/5^:403-410; Thompson et al., 1994, Nucleic Acids Res. 22^:4673- 
4680; Higgms et al, 1996, Methods Enzymol. 26*383-402; Altschul et al., 1990, J. Mol. Biol. 
2/5^:403-410; Altschul et al., 1993, Nature Genetics 5:266-272) , the entire disclosures of which 
are incorporated herein by reference. 

In a particularly preferred embodiment, protein and nucleic acid sequence homologies are 
evaluated using the Basic Local Alignment Search Tool ("BLAST") which is well known in the 
art (see, e.g., Karlin and Altschul, 1990, Proc. Natl. Acad. Sci. USA 57:2267-2268; Altschul et 
al, 1990, J. Mol. Biol. 275:403-410; Altschul et al., 1993, Nature Genetics 5:266-272; Altschul 
et al., 1997, Nuc. Acids Res. 25:3389-3402) , the entire disclosures of which are incorporated herein 
by reference. In particular, five specific BLAST programs are used to perform the following task: 
(!) BLASTP and BLAST3 compare an amino acid query sequence against a protein 

sequence database; 

( 2 ) BLASTN compares a nucleotide query sequence against a nucleotide sequence 

database; 

(3) BLASTX compares the six-frame conceptual translation products of a query 

nucleotide sequence (both strands) against a protein sequence 
database; 
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(4) TBLASTN compares a query protein sequence against a nucleotide sequence 

database translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence 

against the six-frame translations of a nucleotide sequence database. 
The BLAST programs identify homologous sequences by identifying similar segments, 
which are referred to herein as "high-scoring segment pairs/' between a query amino or nucleic 
acid sequence and a test sequence which is preferably obtained from a protein or nucleic acid 
sequence database. High-scoring segment pairs are preferably identified (i.e., aligned) by means 
of a scoring matrix, many of which are known in the art. Preferably, the scoring matrix used is 
the BLOSUM62 matrix (Gonnet et aL, 1992, Science 256: 1443 -1445; Henikoff and Henikoff, 
1993, Proteins 7 7:49-61) , the entire disclosures of which are incorporated herein by reference. 
Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and 
Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence 
and Structure, Washington: National Biomedical Research Foundation), the entire disclosure of 
which is incorporated herein by reference. 

The BLAST programs evaluate the statistical significance of all high-scoring segment 
pairs identified, and preferably selects those segments which satisfy a user-specified threshold of 
significance, such as a user-specified percent homology. Preferably, the statistical significance of 
a high-scoring segment pair is evaluated using the statistical significance formula of Karlin (see, 
e.g., Karlin and AltschuL 1990, Proc. Natl. Acad. Sci. USA 57:2267-2268) , the entire disclosure 
of which is incorporated herein by reference. 

The parameters used with the above algorithms may be adapted depending on the sequence 
length and degree of homology studied. In some embodiments, the parameters may be the default 
parameters used by the algorithms in the absence of instructions from the user. 

In some embodiments, the level of homology between the hybridized nucleic acid and the 
extended cDNA, 5 'EST, or 5' consensus contigated 5'EST from which the probe was derived may 
be determined using the FASTDB algorithm described in Brutlag et al Comp. App. Biosci. 6:237- 
245, 1990, the entire disclosure of which is incorporated herein by reference. In such analyses the 
parameters may be selected as follows: Matrix=Unitary, k-tuple=4, Mismatch Penalty=l, Joining 
Penalty=30, Randomization Group Length=0, Cutoff Score=l, Gap Penalty=5, Gap Size 
Penalty=0.05, Window Size=500 or the length of the sequence which hybridizes to the probe, 
whichever is shorter. Because the FASTDB program does not consider 5' or 3' truncations when 
calculating homology levels, if the sequence which hybridizes to the probe is truncated relative to the 
sequence of the extended cDNA, 5 'EST, or consensus contigated 5 'EST from which the probe was 
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Alternatively, the level of polypeptide homology may be determined using the FASTDB 
algorithm described by Brutlag et al Comp. App. Biosci. 6:237-245, 1990, the entire disclosure of 
which is incorporated herein by reference. In such analyses the parameters may be selected as 
follows: Matrix=PAM 0, k-tuple=2, Mismatch Penalty- 1, Joining Penalty=20, Randomization 
Group Length=0, Cutoff Scored, Window Size=Sequence Length, Gap Penalty=5, Gap Size 
Penalty=0.05, Window Size=500 or the length of the homologous sequence, whichever is shorter. If 
the homologous amino acid sequence is shorter than the amino acid sequence encoded by the 
extended cDNA, 5 ? EST, or consensus contigated 5 ; EST as a result of an N terminal and/or C 
terminal deletion the results may be manually corrected as follows. First, the number of amino acid 
residues of the amino acid sequence encoded by the extended cDNA, 5 ? EST, or consensus 
contigated y EST which are not matched or aligned with the homologous sequence is determined. 
Then, the percentage of the length of the sequence encoded by the extended cDNA, 5 "EST, or 
consensus contigated 5 J EST which the non-matched or non-aligned amino acids represent is 
calculated. This percentage is subtracted from the homology level. For example wherein the amino 
acid sequence encoded by the extended cDNA, 5 ? EST, or consensus contigated 5' EST is 100 amino 
acids in length and the length of the homologous sequence is 80 amino acids and wherein the amino 
acid sequence encoded by the extended cDNA or 5 ? EST is truncated at the N terminal end with 
respect to the homologous sequence, the homology level is calculated as follows. In the preceding 
scenario there are 20 non-matched, non-aligned amino acids in the sequence encoded by the 
extended cDNA, 5 ? EST, or consensus contigated 5 ? EST. This represents 20% of the length of the 
amino acid sequence encoded by the extended cDNA, 5 ? EST. or consensus contigated 5 ? EST. If the 
remaining amino acids are 1005 identical between the two sequences, the homology level would be 
100%-20%=80% homology. No adjustments are made if the non-matched or non-aligned sequences 
are internal or under any other conditions. 

In addition to the above described methods, other protocols are available to obtain extended 
cDNAs using 5 ? ESTs or consensus contigated 5 ? ESTs as outlined in the following paragraphs. 

Extended cDNAs may be prepared by obtaining mRNA from the tissue, cell, or organism of 
interest using mRNA preparation procedures utilizing polyA selection procedures or other 
techniques known to those skilled in the art. A first primer capable of hybridizing to the polyA tail 
of the mRNA is hybridized to the mRNA and a reverse transcription reaction is performed to 
generate a first cDNA strand. 

The first cDNA strand is hybridized to a second primer containing at least 10 consecutive 
nucleotides of the sequences of SEQ ID NOs 24-811 and 1600-1622. Preferably, the primer 
comprises at least 10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides from the sequences of 
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SEQ ID NOs 24-81 1 and 1600-1622. In some embodiments, the pnmer comprises more than 30 
nucleotides from the sequences of SEQ ID NOs 24-811 and 1600-1622. If it is desired to obtain 
extended cDNAs containing the full protein coding sequence, including the authentic translation 
initiation site, the second primer used contains sequences located upstream of the translation 
initiation site. The second primer is extended to generate a second cDNA strand complementary to 
the first cDNA strand. Alternatively, RT-PCR may be performed as described above using primers 
from both ends of the cDNA to be obtained. 

Extended cDNAs containing y fragments of the mRNA may be prepared by hybridizing an 
mRNA compnsing the sequences of SEQ ID NOs. 24-81 1 and 1600-1622 with a pnmer compnsing 
a complementary to a fragment of an EST-related nucleic acid hybridizing the primer to the mRNAs, 
and reverse transcribing the hybndized primer to make a first cDNA strand from the mRNAs. 
Preferably, the primer comprises at least 10, 12, 15, 17, 18, 20, 23, 25, or 28 consecutive nucleotides 
of the sequences complementary to SEQ ID NOs. 24-81 1 and 1600-1622. 

Thereafter, a second cDNA strand complementary to the first cDNA strand is synthesized. 
The second cDNA strand may be made by hybridizing a pnmer complementary to sequences in the 
first cDNA strand to the first cDNA strand and extending the primer to generate the second cDNA 
strand. 

The double stranded extended cDNAs made using the methods descnbed above are isolated 
and cloned. The extended cDNAs may be cloned into vectors such as plasmids or viral vectors 
capable of replicating in an appropnate host cell. For example, the host cell may be a bacterial, 
mammalian, avian, or insect cell. 

Techniques for isolating mRNA, reverse transcribing a primer hybridized to mRNA to 
generate a first cDNA strand, extending a primer to make a second cDNA strand complementary to 
the first cDNA strand, isolating the double stranded cDNA and cloning the double stranded cDNA 
are well known to those skilled in the art and are descnbed in Current Protocols in Molecular 
Biology, John Wiley & Sons, Inc. 1997 and Sambrook et al., Molecular Cloning: A Laboratory 
Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989. 

Alternatively, other procedures may be used for obtaining full-length cDNAs or extended 
cDNAs. In one approach, full-length or extended cDNAs are prepared from mRNA and cloned into 
double stranded phagemids as follows. The cDNA library in the double stranded phagemids is then 
rendered single stranded by treatment with an endonuclease, such as the Gene U product of the phage 
Fl and an exonuclease (Chang et al., Gene 127:95-8, 1993) , the entire disclosure of which is 
incorporated herein by reference. A biotinylated oligonucleotide comprising the sequence of a 
fragment of an EST-related nucleic acid is hybridized to the single stranded phagemids. Preferably, 
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that were screened for the presence of known protein signatures and motifs using the ProScan 
software from the GCG package and the Prosite 15.0 database are provided below. 

The protein of SEQ ID NO: 8 encoded by the full-length cDNA SEQ ID NO: 7 (internal 
designation 78-8-3-E6-CL0_lC) and expressed in adult prostate belong to the 
phosphatidylethanolamine-binding protein from which it exhibits the characteristic PROSITE 
signature from positions 90 to 112. Proteins from this widespread family, from nematodes to fly, 
yeast, rodent and primate species, bind hydrophobic ligands such as phospholipids and 
nucleotides. They are mostly expressed in brain and in testis and are thought to play a role in cell 
growth and/or maturation, in regulation of the sperm maturation, motility and in membrane 
remodeling. They may act either through signal transduction or through oxidoreduction reactions 
(for a review see Schoentgen and Jolles, FEBS Letters, 369:22-26 (1995), the" entire disclosure of 
which is incorporated herein by reference). Taken together, these data suggest that the protein of 
SEQ ID NO: 8 may play a role in cell growth, maturation and in membrane remodeling and/or 
may be related to male fertility. Thus, these protein may be useful in diagnosing and/or treating 
cancer, neurodegenerative diseases, and/or disorders related to male fertility and sterility. 

The protein of SEQ ED No. 10 encoded by the full-length cDNA SEQ ID NO. 9 (internal 
designation 108-013-5-O-H9-FLC) shows homologies with a family of lysophospholipases 
conserved among eukaryotes (yeast, rabbit, rodents and human). In addition, some members of 
this family exhibit a calcium-independent phospholipase A2 activity (Portilla et al, J. Am. Soc. 
Nephro., 9 :1 178-1 186 (1998), the entire disclosure of which is incorporated herein by reference). 
All members of this family exhibit the active site consensus GXSXG motif of carboxylesterases 
that is also found in the protein of SEQ ID NO. 10 (position 54 to 58). In addition, this protein 
may be a membrane protein with one transmembrane domain as predicted by the software 
TopPred II (Claros and von Heijne, CABJOS applic. Notes, 10:685-686 (1994), the entire 
disclosure of which is incorporated herein by reference). Taken together, these data suggest that 
the protein of SEQ ID NO: 10 may play a role in fatty acid metabolism, probably as a 
phospholipase. Thus, this protein or part therein, may be useful in diagnosing and/or treating 
several disorders including, but not limited to, cancer, diabetes, and neurodegenerative disorders 
such as Parkinson's and Alzheimer's diseases. It may also be useful in modulating inflammatory 
responses to infectious agents and/or to suppress graft rejection. 

The protein of SEQ ID NO: 12 encoded by the full-length cDNA SEQ ID NO: 11 
(internal designation 1 08-004-5 -0-D10-FLC) shows remote homology to a subfamily of beta4- 
galactosyltransferases widely conserved in animals (human, rodents, cow and chicken). Such 
enzymes, usually type II membrane proteins located in the endoplasmic reticulum or in the Golgi 
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polyA signal, this sequence can be added to the construct by, for example, splicing out the polyA 
signal from pSG5 (Stratagene) using Bgll and Sail restriction endonuclease enzymes and 
incorporating it into the mammalian expression vector pXTl (Stratagene). pXTl contains the LTRs 
and a portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs in 
the construct allow efficient stable transfection. The vector includes the Herpes Simplex thymidine 
kinase promoter and the selectable neomycin gene. The nucleic acid encoding the polypeptide to be 
expressed is obtained by PCR from the bacterial vector using oligonucleotide primers 
complementary to the nucleic acid encoding the protein or polypeptide to be expressed and 
containing restriction endonuclease sequences for Pst I incorporated into the 5 ? primer and Bglll at 
the 5 J end of 3' primer, taking care to ensure that the nucleic acid encoding the protein or polypeptide 
to be expressed is correctly positioned with respect to the poly A signal. The purified fragment 
obtained from the resulting PCR reaction is digested with PstI, blunt ended with an exonuclease, 
digested with Bgl II, purified and ligated to pXTl, now containing a poly A signal and digested with 
Bglll. 

The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life 
Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 
Positive transfectants are selected after growing the transfected cells in 600 ^ig/ml G418 (Sigma, St. 
Louis, Missouri). 

Alternatively, the nucleic acid encoding the protein or polypeptide to be expressed may be 
cloned into pED6dpc2. The resulting pED6dpc2 constructs may be transfected into a suitable host 
cell, such as COS 1 cells. Methotrexate resistant cells are selected and expanded. The expressed 
protein or polypeptide may be isolated, purified, or enriched as described above. 

To confirm expression of the desired protein or polypeptide, the proteins or polypeptides 
produced by cells containing a vector with a nucleic acid insert encoding the protein or polypeptide 
are compared to those lacking such an insert. The expressed proteins are detected using techniques 
familiar to those skilled in the art such as Coomassie blue or silver staining or using antibodies 
against the protein or polypeptide encoded by the nucleic acid insert. Antibodies capable of 
specifically recognizing the protein of interest may be generated using synthetic 15-mer peptides 
having a sequence encoded by the appropriate nucleic acid. The synthetic peptides are injected into 
mice to generate antibody to the polypeptide encoded by the nucleic acid. 

If the proteins or polypeptides encoded by the nucleic acid inserts are secreted, medium 
prepared from the host cells or organisms containing an expression vector which contains a nucleic 
acid insert encoding the desired protein or polypeptide is compared to medium prepared from the 
control cells or organism. The presence of a band in medium from the cells containing the nucleic 
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unlabeled proteins or polypeptides may be incubated with the cells and detected with antibodies 
having a detectable label, such as a fluorescent molecule, attached thereto. 

Specificity of cell surface binding may be analyzed by conducting a competition analysis in 
which various amounts of unlabeled protein or polypeptide are incubated along with the labeled 
protein or polypeptide. The amount of labeled protein or polypeptide bound to the cell surface 
decreases as the amount of competitive unlabeled protein or polypeptide increases. As a control, 
various amounts of an unlabeled protein or polypeptide unrelated to the labeled protein or 
polypeptide is included in some binding reactions. The amount of labeled protein or polypeptide 
bound to the cell surface does not decrease in binding reactions containing increasing amounts of 
unrelated unlabeled protein, indicating that the protein or polypeptide encoded by the nucleic acid 
binds specifically to the cell surface. 

As discussed above, human proteins have been shown to have a number of important 
physiological effects and, consequently, represent a valuable therapeutic resource. The human 
proteins or polypeptides made as described above may be evaluated to determine their physiological 
activities as described below. 

EXAMPLE 24 

Assaying the Expressed Proteins or Polypeptides for Cytokine, 
Cell Proliferation or Cell Differentiation Activity 
As discussed above, some human proteins act as cytokines or may affect cellular 
proliferation or differentiation. Many protein factors discovered to date, including all known 
cytokines, have exhibited activity in one or more factor dependent cell proliferation assays, and 
hence the assays serve as a convenient confirmation of cytokine activity. The activity of a protein or 
polypeptide of the present invention is evidenced by any one of a number of routine factor dependent 
cell proliferation assays for cell lines including, without limitation, 32D, DA2, DA1G, T10, B9, 
B9/1 1, BaF3, MC9/G, NT (preB NT), 2E8, RB5, DAI, 123, Tl 165, HT2, CTLL2, TF-1, Mo7c and 
CMK. The proteins or polypeptides prepared as described above may be evaluated for their ability to 
regulate T cell or thymocyte proliferation in assays such as those described above or in the following 
references: Current Protocols in Immunology, Ed. by J.E. Coligan et al., Greene Publishing 
Associates and Wiley-Interscience; Takai et al J. Immunol 137:3494-3500, 1986., Bertagnolli et al 
J. Immunol. 145:1706-1712, 1990., Bertagnolli et ai, Cellular Immunology 133:327-341, 1991. 
Bertagnolli, et al J. Immunol 149:3778-3783, 1992; and Bowman et al, J. Immunol 152:1756- 
1 761, 1994, the entire disclosures of which are incorporated herein by reference. 
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activity to influence thymocyte or splenocyte cytotoxicity. Numerous assays for such activity are 
familiar to those skilled in the art including the assays described in the following references: Chapter 
3 (In vitro Assays for Mouse Lymphocyte Function 3.1-3.19) and Chapter 7 (Immunologic studies in 
Humans) in Current Protocols in Immunology , J.E. Coligan et al. Eds, Greene Publishing 
Associates and Wiley-Interscience; Herrmann et al, Proc. Natl. Acad. Sci. USA 78:2488-2492, 1981; 
Herrmann et al., J. Immunol. 128:1968-1974, 1982; Handa et al,J. Immunol. 135:1564-1572, 1985; 
Takai et al. J. Immunol. 137:3494-3500, 1986; Takai et al., J. Immunol. 140:508-512, 1988; 
Bowman et al, J. Virology 61: 1992-1998; Bertagnolli et al. Cell. Immunol. 133:327-341, 1991; and 
Brown et al.. J. Immunol. 153:3079-3092, 1994, the entire disclosures of which are incorporated 
herein by reference. 

The proteins or polypeptides prepared as described above may also be evaluated for their 
effects on T-cell dependent immunoglobulin responses and isotype switching. Numerous assays for 
such activity are familiar to those skilled in the art, including the assays disclosed in the following 
references: Maliszewski, J. Immunol. 144:3028-3033, 1990, the entire disclosure of which is 
incorporated herein by reference; and Mond et al. in Current Protocols in Immunology, 1 : 3.8.1 - 
3.8.16, supra. 

The proteins or polypeptides prepared as described above may also be evaluated for their 
effect on immune effector cells, including their effect on Thl cells and cytotoxic lymphocytes. 
Numerous assays for such activity are familiar to those skilled in the art, including the assays 
disclosed in the following references: Chapter 3 (In vitro Assays for Mouse Lymphocyte Function 
3.1-3.19) and Chapter 7 (Immunologic Studies in Humans) in Current Protocols in Immunology, 
supra; Takai et al., J. Immunol. 137:3494-3500, 1986; Takai et al, J. Immunol. 140:508-512, 1988; 
and Bertagnolli et al., J. Immunol. 149:3778-3783, 1992, the entire disclosures of which are 
incorporated herein by reference. 

The proteins or polypeptides prepared as described above may also be evaluated for their 
effect on dendritic cell mediated activation of naive T-cells. Numerous assays for such activity are 
familiar to those skilled in the art, including the assays disclosed in the following references: Guery 
et al, J. Immunol. 134:536-544, 1995; Inaba et al, J. Exp. Med. 173:549-559, 1991; Macatonia et 
al, J. Immunol. 154:5071-5079, 1995; Porgador et alJ. Exp. Med 182:255-260, 1995; Nair et al, J. 
Virol 67:4062-4069, 1993; Huang et al, Science 264:961-965, 1994; Macatonia et alJ. Exp. Med 
169:1255-1264, 1989; Bhardwaj et al, Journal of Clinical Investigation 94:797-807, 1994; and 
Inaba et al, J. Exp. Med 172:631-640, 1990, the entire disclosures of which are incorporated herein 
by reference. 
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The proteins or polypeptides prepared as described above may also be evaluated for their 
influence on the lifetime of lymphocytes. Numerous assays for such activity are familiar to those 
skilled in the art, including the assays disclosed in the following references: Darzynkiewicz et aL, 
Cytometry 13:795-808, 1992; Gorczyca et aL, Leukemia 7:659-670, 1993; Gorczyca et aL, Cancer 
Res. 53:1945-1951, 1993; Itoh et aL, Cell 66:233-243, 1991; Zacharchuk, J. Immunol. 145:4037- 
4045, 1990; Zamai et aL, Cytometry 14:891-897, 1993; and Gorczyca et aL, Int. J. Oncol. 1:639-648, 
1992, the entire disclosures of which are incorporated herein by reference. 

The proteins or polypeptides prepared as described above may also be evaluated for their 
influence on early steps of T-cell commitment and development. Numerous assays for such activity 
are familiar to those skilled in the art, including without limitation the assays disclosed in the 
following references: Antica et aL, Blood 84:1 1 1-117, 1994; Fine et aL, Cell. Immunol. 155:1 1 1- 
122, 1994; Galy et aL, Blood 85:2770-2778, 1995; and Toki et aL, Proc. Nat. Acad Sci. USA 
88:7548-755 1, 1991, the entire disclosures of which are incorporated herein by reference. 

Those proteins or polypeptides which exhibit activity as immune system regulators activity 
may then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation 
of immune activity is beneficial. For example, the protein or polypeptide may be useful in the 
treatment of various immune deficiencies and disorders (including severe combined 
immunodeficiency), e.g., in regulating (up or down) growth and proliferation of T and/or B 
lymphocytes, as well as effecting the cytolytic activity of NK cells and other cell populations. These 
immune deficiencies may be genetic or be caused by viral (e.g., HIV) as well as bacterial or fungal 
infections, or may result from autoimmune disorders. More specifically, infectious diseases caused 
by viral, bacterial, fungal or other infection may be treatable using the protein or polypeptide 
including infections by HIV, hepatitis viruses, herpesviruses, mycobacteria, Leishmania spp., 
plamodium. and various fungal infections such as candidiasis. Of course, in this regard, a protein or 
polypeptide may also be useful where a boost to the immune system generally may be desirable, i.e., 
in the treatment of cancer. 

Alternatively, the proteins or polypeptides prepared as described above may be used in 
treatment of autoimmune disorders including, for example, connective tissue disease, multiple 
sclerosis, systemic lupus erythematosus, rheumatoid arthritis, autoimmune pulmonary inflammation, 
Guillain-Barre syndrome, autoimmune thyroiditis, insulin dependent diabetes mellitis, myasthenia 
gravis, graft-versus-host disease and autoimmune inflammatory eye disease. Such a protein or 
polypeptide may also to be useful in the treatment of allergic reactions and conditions, such as 
asthma (particularly allergic asthma) or other respiratory problems. Other conditions, in which 
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immune suppression is desired (including, for example, organ transplantation), may also be treatable 
using the protein or polypeptide. 

Using the proteins or polypeptides of the invention it may also be possible to regulate 
immune responses either up or down. Down regulation may involve inhibiting or blocking an 
immune response already in progress or may involve preventing the induction of an immune 
response. The functions of activated T-cells may be inhibited by suppressing T cell responses or by 
inducing specific tolerance in T cells, or both. Immunosuppression of T cell responses is generally 
an active non-antigen-specific process which requires continuous exposure of the T cells to the 
suppressive agent. Tolerance, which involves inducing non-responsiveness or anergy in T cells, is 
distinguishable from immunosuppression in that it is generally antigen-specific and persists after the 
end of exposure to the tolerizing agent. Operationally, tolerance can be demonstrated by the lack of 
a T cell response upon reexposure to specific antigen m the absence of the tolerizing agent. 

Down regulating or preventing one or more antigen functions (including without limitation 
B lymphocyte antigen functions, such as, for example, B7 costimulation), e.g., preventing high level 
lymphokine synthesis by activated T cells, will be useful in situations of tissue, skin and organ 
transplantation and in graft-versus-host disease (GVHD). For example, blockage of T cell function 
should result in reduced tissue destruction in tissue transplantation. Typically, in tissue transplants, 
rejection of the transplant is initiated through its recognition as foreign by T cells, followed by an 
immune reaction that destroys the transplant. The administration of a molecule which inhibits or 
blocks interaction of a B7 lymphocyte antigen with its natural ligand(s) on immune cells (such as a 
soluble, monomenc form of a peptide having B7-2 activity alone or in conjunction with a monomelic 
form of a peptide having an activity of another B lymphocyte antigen (e.g., B7-1, B7-3) or blocking 
antibody), prior to transplantation, can lead to the binding of the molecule to the natural ligand(s) on 
the immune cells without transmitting the corresponding costimulatory signal. Blocking B 
lymphocyte antigen function in this matter prevents cytokine synthesis by immune cells, such as T 
cells, and thus acts as an immunosuppressant. Moreover, the lack of costimulation may also be 
sufficient to anergize the T cells, thereby inducing tolerance in a subject. Induction of long-term 
tolerance by B lymphocyte antigen-blocking reagents may avoid the necessity of repeated 
administration of these blocking reagents. To achieve sufficient immunosuppression or tolerance in 
a subject, it may also be necessary to block the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing organ transplant rejection or 
GVHD can be assessed using animal models that are predictive of efficacy in humans. Examples of 
appropriate systems which can be used include allogeneic cardiac grafts in rats and xenogeneic 
pancreatic islet cell grafts in mice, both of which have been used to examine the immunosuppressive 
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the healing of bone fractures and cartilage damage or defects in humans and other animals. Such a 
preparation employing a protein or polypeptide of the invention may have prophylactic use in closed 
as well as open fracture reduction and also in the improved fixation of artificial joints. De novo bone 
synthesis induced by an osteogenic agent contributes to the repair of congenital, trauma induced, or 
oncologic resection induced craniofacial defects, and also is useful in cosmetic plastic surgery. 

A protein or polypeptide of this invention may also be used in the treatment of periodontal 
disease, and in other tooth repair processes. Such agents may provide an environment to attract 
bone-forming cells, stimulate growth of bone-forming cells or induce differentiation of progenitors 
of bone-forming cells. A protein of the invention may also be useful in the treatment of osteoporosis 
or osteoarthritis, such as through stimulation of bone and/or cartilage repair or by blocking 
inflammation or processes of tissue destruction (collagenase activity, osteoclast activity, etc.) 
mediated by inflammatory processes. 

Another category of tissue regeneration activity that may be attributable to the proteins or 
polypeptides encoded by the nucleic acids described above is tendon/ligament formation. A protein 
or polypeptide encoded by the nucleic acids described above, which induces tendon/ligament-like 
tissue or other tissue formation in circumstances where such tissue is not normally formed, has 
application in the healing of tendon or ligament tears, deformities and other tendon or ligament 
defects in humans and other animals. Such a preparation employing a tendon/ligament-like tissue 
inducing protein may have prophylactic use in preventing damage to tendon or ligament tissue, as 
well as use in the improved fixation of tendon or ligament to bone or other tissues, and in repairing 
defects to tendon or ligament tissue. De novo tendon/ligament-like tissue formation induced by a 
protein or polypeptide of the present invention contributes to the repair of tendon or ligaments 
defects of congenital, traumatic or other origin and is also useful in cosmetic plastic surgery for 
attachment or repair of tendons or ligaments. The proteins or polypeptides of the present invention 
may provide an environment to attract tendon- or ligament-forming cells, stimulate growth of 
tendon- or ligament-forming cells, induce differentiation of progenitors of tendon- or ligament- 
forming cells, or induce growth of tendon/ligament cells or progenitors ex vivo for return in vivo to 
effect tissue repair. The proteins or polypeptides of the invention may also be useful in the treatment 
of tendinitis, carpal tunnel syndrome and other tendon or ligament defects. The therapeutic 
compositions may also include an appropriate matrix and/or sequestering agent as a carrier as is well 
known in the art. 

The proteins or polypeptides of the present invention may also be useful for proliferation of 
neural cells and for regeneration of nerve and brain tissue, i.e., for the treatment of central and 
peripheral nervous system diseases and neuropathies, as well as mechanical and traumatic disorders, 
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EXAMPLE 30 

Assaying the Expressed Proteins or Polypeptides for Regulation of Blood Clotting 
The proteins or polypeptides of the present invention may also be evaluated for their effects 
on blood clotting. Numerous assays for such activity are familiar to those skilled in the art, including 
the assays disclosed in the following references: Linet et aL J- Clin. Pharmacol. 26:131-140, 1986; 
Burdick et aL, Tlirombosis Res. 45:413-419, 1987; Humphrey et aL, Fibrinolysis 5:71-79 (1991); 
and Schaub, Prostaglandins 35:467-474, 1988, the entire disclosures of which are incorporated 
herein by reference. 

Those proteins or polypeptides which are involved in the regulation of blood clotting may 
then be formulated as pharmaceuticals and used to treat clinical conditions in which regulation of 
blood clotting is beneficial. For example, a protein or polypeptide of the invention may also exhibit 
hemostatic or thrombolytic activity. As a result, such a protein or polypeptide is expected to be 
useful in treatment of various coagulations disorders (including hereditary disorders, such as 
hemophilias) or to enhance coagulation and other hemostatic events in treating wounds resulting 
from trauma, surgery or other causes. A protein or polypeptide of the invention may also be useful 
for dissolving or inhibiting formation of thromboses and for treatment and prevention of conditions 
resulting therefrom (such as infarction of cardiac and central nervous system vessels (e.g., stroke)). 
Alternatively, as described in more detail below, nucleic acids encoding blood clotting activity 
proteins or polypeptides or nucleic acids regulating the expression of such proteins or polypeptides 
may be introduced into appropriate host cells to increase or decrease the expression of the proteins or 
polypeptides as desired. 

EXAMPLE 31 

Assaying the Expressed Proteins or Polypeptides for Involvement in 
Receptor/Ligand Interactions 
The proteins or polypeptides of the present invention may also be evaluated for their 
involvement in receptor/Iigand interactions. Numerous assays for such involvement are familiar to 
those skilled in the art, including the assays disclosed in the following references: Chapter 7. 7.28.1- 
7.28.22) in Current Protocols in Immunology, J.E. Coligan et aL Eds. Greene Publishing Associates 
and Wiley-Interscience; Takai et aL, Proc. Natl. Acad ScL USA 84:6864-6868, 1987; Bierer et aL, J. 
Exp. Med 168:1 145-1 156, 1988; Rosenstein et aL, J. Exp. Med 169:149-160, 1989; Stoltenborg et 
al.,J. Immunol. Methods 175:59-68, 1994; Stitt et aL, Cell 80:661-670, 1995; and Gyuris et aL, Cell 
75:791-803, 1993, the entire disclosures of which are incorporated herein by reference. 
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EXAMPLE 33 

Assaying the Expressed Proteins or Polypeptides for Tumor Tnhihirinn Arrivity 
The proteins or polypeptides of the present invention may also be evaluated for tumor 
inhibition activity. In addition to the activities described above for immunological treatment or 
prevention of tumors, a protein or polypeptide of the invention may exhibit other anti-tumor 
activities. A protein or polypeptide may inhibit tumor growth directly or indirectly (such as, for 
example, via ADCC). A protein or polypeptide may exhibit its tumor inhibitory activity by acting on 
tumor tissue or tumor precursor tissue, by inhibiting formation of tissues necessary to support tumor 
growth (such as, for example, by inhibiting angiogenesis), by causing production of other factors, 
agents or cell types which inhibit tumor growth, or by suppressing, eliminating or inhibiting factors, 
agents or cell types which promote tumor growth. . Alternatively, as described in more detail below, 
nucleic acids encoding proteins or polypeptides with tumor inhibition activity or nucleic acids 
regulating the expression of such proteins or polypeptides may be introduced into appropriate host 
15 cells to increase or decrease the expression of the proteins or polypeptides as desired. 

A protein or polypeptide of the invention may also exhibit one or more of the following 
additional activities or effects: inhibiting the growth, infection or function of, or killing, infectious 
agents, including, without limitation, bacteria, viruses, fungi and other parasites; effecting 
(suppressing or enhancing) bodily characteristics, including, without limitation, height, weight, hair 
color, eye color, skin, fat to lean ratio or other tissue pigmentation, or organ or body part size or 
shape (such as, for example, breast augmentation or diminution, change in bone form or shape); 
effecting biorhythms or circadian cycles or rhythms; effecting the fertility of male or female subjects; 
effecting the metabolism, catabolism, anabolism, processing, utilization, storage or elimination of 
dietary fat, lipid, protein, carbohydrate, vitamins, minerals, cofactors or other nutritional factors or 
component(s); effecting behavioral characteristics, including, without limitation, appetite, libido, 
stress, cognition (including cognitive disorders), depression (including depressive disorders) and 
violent behaviors; providing analgesic effects or other pain reducing effects; promoting 
differentiation and growth of embryonic stem cells in lineages other than hematopoietic lineages; 
hormonal or endocrine activity; in the case of enzymes, correcting deficiencies of the enzyme and 
30 treating deficiency-related diseases; treatment of hyperproliferative disorders (such as, for example, 

psoriasis); immunoglobulin-like activity (such as, for example, the ability to bind antigens or 
complement); and the ability to act as an antigen in a vaccine composition to raise an immune 
response against such protein or another material or entity which is cross-reactive with such protein. 
Alternatively, as described in more detail below, nucleic acids encoding proteins or polypeptides 
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Proteins, polypeptides or other molecules interacting with proteins or polypeptides of the 
present invention can be found by a variety of additional techniques. In one method, affinity 
columns containing the protein or polypeptide of the present invention can be constructed. In 
some versions, of this method the affinity column contains chimeric proteins in which the protein 
or polypeptide of the present invention is fused to glutathione S-transferase. A mixture of cellular 
proteins or pool of expressed proteins as described above and is applied to the affinity column. 
Molecules interacting with the protein or polypeptide attached to the column can then be isolated 
and analyzed on 2-D electrophoresis gel as described in Ramunsen et al Electrophoresis, 18, 
588-598 (1997). Alternatively, the molecules retained on the affinity column can be purified by 
electrophoresis based methods and sequenced. The same method can be used to isolate 
antibodies, to screen phage display products, or to screen phage display human antibodies. 

Molecules interacting with the proteins or polypeptides of the present invention can also 
be screened by using an Optical Biosensor as described in Edwards & Leatherbarrow, Analytical 
Biochemistry^ 246, 1-6 (1997), the entire disclosure of which is incorporated herein by reference. 
The main advantage of the method is that it allows the determination of the association rate 
between the protein or polypeptide and other interacting molecules. Thus, it is possible to 
specifically select interacting molecules with a high or low association rate. Typically a target 
molecule is linked to the sensor surface (through a carboxymethl dextran matrix) and a sample of 
test molecules is placed in contact with the target molecules. The binding of a test molecule to 
the target molecule causes a change in the refractive index and/ or thickness. This change is 
detected by the Biosensor provided it occurs in the evanescent field (which extends a few 
hundred nanometers from the sensor surface). In these screening assays, the target molecule can 
be one of the proteins or polypeptides of the present invention and the test sample can be a 
collection of proteins, polypeptides or other molecules extracted from tissues or cells, a pool of 
expressed proteins, combinatorial peptide and/ or chemical libraries, or phage displayed peptides. 
The tissues or cells from which the test molecules are extracted can originate from any species. 

In other methods, a target protein or polypeptide is immobilized and the test population is 
a collection of unique proteins or polypeptides of the present invention. 

To study the interaction of the proteins or polypeptides of the present invention with 
drugs, the microdialysis coupled to HPLC method described by Wang et al, Chromatographia, 
44, 205-208(1997) or the affinity capillary electrophoresis method described by Busch et ai y J. 
Chromatogr. 777:311-328 (1997), the entire disclosures of which are incorporated herein by 
reference, can be used. 
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nucleic acid in the sample is denatured and the PCR primers are specifically hybridized to 
complementary nucleic acid sequences in the sample. The hybridized primers are extended. 
Thereafter, another cycle of denaturation, hybridization, and extension is initiated. The cycles are 
repeated multiple times to produce an amplified fragment containing the nucleic acid sequence 
between the primer sites. 

EXAMPLE 37 

Use of the EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids as probes 
Probes derived from EST-related nucleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-related nucleic acids may be labeled with 
detectable labels familiar to those skilled in the art, including radioisotopes and non-radioactive 
labels, to provide a detectable probe. The detectable probe may be single stranded or double 
stranded and may be made using techniques known in the art, including in vitro transcription, nick 
translation, or kinase reactions. A nucleic acid sample containing a sequence capable of hybridizing 
to the labeled probe is contacted with the labeled probe. If the nucleic acid in the sample is double 
stranded, it may be denatured prior to contacting the probe. In some applications, the nucleic acid 
sample may be immobilized on a surface such as a nitrocellulose or nylon membrane. The nucleic 
acid sample may comprise nucleic acids obtained from a variety of sources, including genomic 
DNA, cDNA libraries, RNA, or tissue samples. 

Procedures used to detect the presence of nucleic acids capable of hybridizing to the 
detectable probe include well known techniques such as Southern blotting, Northern blotting, dot 
blotting, colony hybridization, and plaque hybridization. In some applications, the nucleic acid 
capable of hybridizing to the labeled probe may be cloned into vectors such as expression vectors, 
sequencing vectors, or in vitro transcription vectors to facilitate the characterization and expression 
of the hybridizing nucleic acids in the sample. For example, such techniques may be used to isolate 
and clone sequences in a genomic library or cDNA library which are capable of hybridizing to the 
detectable probe as described in Example 20 above. 

PCR primers made as described in Example 36 above may be used in forensic analyses, 
such as the DNA fingerprinting techniques described in Examples 38-42 below. Such analyses may 
utilize detectable probes or primers based on the sequences of the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids. 
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contains 200 amplified sequences. This PCR-generated DNA is then digested with one or a 
combination of, preferably, four base specific restriction enzymes. Such enzymes are commercially 
available and known to those of skill in the art. After digestion, the resultant gene fragments are size 
separated in multiple duplicate wells on an agarose gel and transferred to nitrocellulose using 
Southern blotting techniques well known to those with skill in the art. For a review of Southern 
blotting see Davis et al. (Basic Methods in Molecular Biology, 1986, Elsevier Press, pp 62-65), the 
entire disclosure of which is incorporated herein by reference. 

A panel of probes based on the sequences of the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids are radioactively or colorimetrically labeled using methods known in the art, such as nick 
translation or end labeling, and hybridized to the Southern blot using techniques known in the art 
(Davis et al, supra). Preferably, the probe is at least 10, 12, 15, 18, 20, 25, 28, 30, 35, 40, 50, 75, 
100, 150, 200, 300, 400 or 500 nucleotides in length. Preferably, the probes are at least 10, 12, 15, 
18, 20, 25, 28, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 nucleotides in length. In some 
embodiments, the probes are oligonucleotides which are 40 nucleotides in length or less. 

Preferably, at least 5 to 10 of these labeled probes are used, and more preferably at least 
about 20 or 30 are used to provide a unique pattern. The resultant bands appearing from the 
hybridization of a large sample of EST-related nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of EST-related nucleic acids will be a unique 
identifier. Since the restriction enzyme cleavage will be different for every individual, the band 
pattern on the Southern blot will also be unique. Increasing the number of probes will provide a 
statistically higher level of confidence in the identification since there will be an increased number of 
sets of bands used for identification. 



EXAMPLE 41 
Dot Blot Identification Procedure 

Another technique for identifying individuals using the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids disclosed herein utilizes a dot blot hybridization technique. 

Genomic DNA is isolated from nuclei of subject to be identified. Probes are prepared that 
correspond to at least 10, preferably 50 sequences from the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of positional segments of EST-related nucleic 
acids. The probes are used to hybridize to the genomic DNA through conditions known to those in 
the art. The oligonucleotides are end labeled with P 32 using polynucleotide kinase (Pharmacia). Dot 
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Blots are created by spotting the genomic DNA onto nitrocellulose or the like using a vacuum dot 
blot manifold (BioRad, Richmond California). The nitrocellulose filter containing the genomic 
sequences is baked or UV linked to the filter, prehybndized and hybridized with labeled probe using 
techniques known in the art (Davis et al., supra). The 32 P labeled DNA fragments are sequentially 
hybridized with successively stringent conditions to detect minimal differences between the 30 bp 
sequence and the DNA. Tetramethylammonium chloride is useful for identifying clones containing 
small numbers of nucleotide mismatches (Wood et al. t Proc. Natl. Acad. Set. USA 82(6): 1585-1588 
(1985), the entire disclosure of which is incorporated herein by reference). A unique pattern of dots 
distinguishes one individual from another individual. 

EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids can be used as probes in the following alternative 
fingerprinting technique. In some embodiments, the probes are oligonucleotides which are 40 
nucleotides in length or less. 

Preferably, a plurality of probes having sequences from different EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids are used in the alternative fingerprinting technique. Example 42 below provides a 
representative alternative fingerprinting procedure in which the probes are derived from EST-related 
nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments 
of EST-related nucleic acids. 

EXAMPLE 42 

Alternative "Fingerprint" Identification Technique 
Oligonucleotides are prepared from a large number, e.g. 50, 100, or 200, EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST- 
related nucleic acids using commercially available oligonucleotide services such as Genset, Paris, 
France. Preferably, the oligonucleotides are at least 10, 15, 18, 20, 23, 25 28, or 30 nucleotides in 
length. However, in some embodiments, the oligonucleotides may be more than 40, 50, 60 or 70 
nucleotides in length. 

Cell samples from the test subject are processed for DNA using techniques well known to 
those with skill in the art. The nucleic acid is digested with restriction enzymes such as EcoRI and 
Xbal. Following digestion, samples are applied to wells for electrophoresis. The procedure, as 
known in the art, may be modified to accommodate polyaciylamide electrophoresis, however in this 
example, samples containing 5 ug of DNA are loaded into wells and separated on 0.8% agarose gels. 
The gels are transferred onto nitrocellulose using standard Southern blotting techniques. 
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10 ng of each of the oligonucleotides are pooled and end-labeled with P 32 . The 
nitrocellulose is prehybridized with blocking solution and hybridized with the labeled probes. 
Following hybridization and washing, the nitrocellulose filter is exposed to X-Omat AR X-ray film. 
The resulting hybridization pattern will be unique for each individual. 
5 It is additionally contemplated within this example that the number of probe sequences used 

can be varied for additional accuracy or clarity. 

In addition to their applications in forensics and identification, EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids may be mapped to their chromosomal locations. Example 41 below describes 
10 radiation hybrid (RH) mapping of human chromosomal regions using EST-related nucleic acids, 

positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids. Example 42 below describes a representative procedure for mapping EST-related 
nucleic acids, positional segments of EST-related nucleic acids or fragments of positional segments 
of EST-related nucleic acids to their locations on human chromosomes. Example 43 below 
1 5 describes mapping of EST-related nucleic acids, positional segments of EST-related nucleic acids or 

fragments of positional segments of EST-related nucleic acids on metaphase chromosomes by 
Fluorescence In Situ Hybridization (FISH). 

2. Use of EST-related nucleic acids, position al segments of EST-related nucleic acids or fragments 
20 of positional segments of EST-related nucleic acids in Chromosome Mapp ing 

EXAMPLE 43 

Radiation hybrid mapping o f EST-related nucleic acids, positional segments of 
EST-related nucle ic acids or fragments of positional segments of 
EST-related nucleic acids to the human genome 
25 Radiation hybrid (RH) mapping is a somatic cell genetic approach that can be used for high 

resolution mapping of the human genome. In this approach, cell lines containing one or more human 
chromosomes are lethally irradiated, breaking each chromosome into fragments whose size depends 
on the radiation dose. These fragments are rescued by fusion with cultured rodent cells, yielding 
subclones containing different portions of the human genome. This technique is described by 
30 Benham et al. {Genomics 4:509-5 1 7, 1 989) and Cox et ah, (Science 250:245-250, 1990) , the entire 

disclosures of which are incorporated herein by reference. The random and independent nature of 
the subclones permits efficient mapping of any human genome marker. Human DNA isolated from 
a panel of 80-100 cell lines provides a mapping reagent for ordering EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
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nucleic acids. In this approach, the frequency of breakage between markers is used to measure 
distance, allowing construction of fine resolution maps as has been done using conventional ESTs 
(Schuler et al, Science 274:540-546, 1996), the entire disclosure of which is incorporated herein by 
reference. 

RH mapping has been used to generate a high-resolution whole genome radiation hybrid 
map of human chromosome 17q22-q25.3 across the genes for growth hormone (GH) and thymidine 
kinase (TK) (Foster et al., Genomics 33: 185-192, 1996), the region surrounding the Gorlin syndrome 
gene (Obermayr et al., Eur. J. Hum. Genet. 4:242-245, 1996), 60 loci covering the entire short arm of 
chromosome 12 (Raeymaekers et al, Genomics 29:170-178, 1995), the region of human 
chromosome 22 containing the neurofibromatosis type 2 locus (Frazer et al., Genomics 14:574-584, 
1992) and 13 loci on the long arm of chromosome 5 (Warrington et al, Genomics 11:701-708, 
1991), the entire disclosures of which are incorporated herein by reference. 

EXAMPLE 44 

Mapping of EST-related nucleic acids, positional segments of 
EST-related n ucleic acids or fragments of positional segments of 
EST-related nucleic acids to Human Chromosomes using PCR techniques 
EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may be assigned to human chromosomes using 
PCR based methodologies. In such approaches, oligonucleotide primer pairs are designed from 
EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids to minimize the chance of amplifying through an 
intron. Preferably, the oligonucleotide primers are 18-23 bp in length and are designed for PCR 
amplification. The creation of PCR primers from known sequences is well known to those with skill 
in the art. For a review of PCR technology see Erlich. in PCR Technology; Principles and 
Applications for DNA Amplification. 1992. W.H. Freeman and Co., New York, the entire disclosure 
of which is incorporated herein by reference. 

The primers are used in polymerase chain reactions (PCR) to amplify templates from total 
human genomic DNA. PCR conditions are as follows: 60 ng of genomic DNA is used as a template 
for PCR with 80 ng of each oligonucleotide primer, 0.6 unit of Taq polymerase, and 1 uCu of a 32P- 
labeled deoxycytidine triphosphate. The PCR is performed in a microplate thermocycler (Techne) 
under the following conditions: 30 cycles of 94°C, 1.4 min; 55°C, 2 min; and 72°C, 2 min; with a 
final extension at 72°C for 10 min. The amplified products are analyzed on a 6% polyacrylamide 
sequencing gel and visualized by autoradiography. If the length of the resulting PCR product is 
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In a preferred embodiment, chromosomal localization of EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids are obtained by FISH as described by Cherif et ai (Proc. Natl. Acad. Sci. USA., 
87:6639-6643, 1990, the entire disclosure of which is incorporated herein by reference). Metaphase 
chromosomes are prepared from phytohemagglutinin (PHA)-stimulated blood cell donors. PHA- 
srimulated lymphocytes from healthy males are cultured for 72 h in RPMI-1640 medium. For 
synchronization, methotrexate (10 |iM) is added for 17 h ? followed by addition of 5- 
bromodeoxyundine (5-BrdU, 0.1 miVl) for 6 h. Colcemid (1 ng/mi) is added for the last 15 min 
before harvesting the cells. Cells are collected, washed in RPML incubated with a hypotonic 
solution of KCI (75 mM) at 37°C for 15 min and fixed in three changes of methanoI:acetic acid (3:1). 
The cell suspension is dropped onto a glass slide and air dried. The EST-related nucleic acids, 
positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids is labeled with biotin-16 dUTP by nick translation according to the manufacturers 
instructions (Bethesda Research Laboratories, Bethesda, MD), purified using a Sephadex G-50 
column (Pharmacia, Upsala, Sweden) and precipitated. Just prior to hybridization, the DNA pellet is 
dissolved in hybridization buffer (50% formamide, 2 X SSC, 10% dextran sulfate, 1 mg/ml sonicated 
salmon sperm DNA, pH 7) and the probe is denatured at 70°C for 5-10 min. 

Slides kept at -20°C are treated for 1 h at 37°C with RNase A (100 ^ig/ml), rinsed three 
times in 2 X SSC and dehydrated in an ethanol series. Chromosome preparations are denatured in 
70% formamide, 2 X SSC for 2 min at 70°C, then dehydrated at 4°C. The slides are treated with 
proteinase K (10 jig/100 ml in 20 mM Tris-HCL 2 mM CaCl 2 ) at 37°C for 8 min and dehydrated. 
The hybridization mixture containing the probe is placed on the slide, covered with a coverslip, 
sealed with rubber cement and incubated overnight in a humid chamber at 37°C. After hybridization 
and post-hybridization washes, the biotinylated probe is detected by avidin-FITC and amplified with 
additional layers of biotinylated goat anti-avidin and avidin-FITC. For chromosomal localization, 
fluorescent R-bands are obtained as previously described (Cherif et ai, supra.). The slides are 
observed under a LEICA fluorescence microscope (DMRXA). Chromosomes are counterstained 
with propidium iodide and the fluorescent signal of the probe appears as two symmetrical yellow- 
green spots on both chromatids of the fluorescent R-band chromosome (red). Thus, a particular 
EST-related nucleic acids, positional segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids may be localized to a particular cytogenetic R-band 
on a given chromosome. 
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nucleic acids or fragments of positional segments of EST-related nucleic acids or a primer 
corresponding to a sequence included in the cloning vector. The resulting double stranded DNA is 
transformed into bacteria. cDNAs containing the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids are 
identified by colony PCR or colony hybridization. 

Once the upstream genomic sequences have been cloned and sequenced as described above, 
prospective promoters and transcription start sites within the upstream sequences may be identified 
by comparing the sequences upstream of the EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional segments of EST-related nucleic acids with databases 
containing known transcription start sites, transcription factor binding sites, or promoter sequences. 

In addition, promoters in the upstream sequences may be identified using promoter reporter 
vectors as described in Example 54. 

EXAMPLE 54 

Identification of Promoters in Cloned Upstream Sequences 
The genomic sequences upstream of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional segments of EST-related nucleic acids are 
cloned into a suitable promoter reporter vector, such as the pSEAP-Basic, pSEAP-Enhancer, pP-gal- 
Basic, pp-gal-Enhancer, or pEGFP-1 Promoter Reporter vectors available from Clontech. Briefly, 
each of these promoter reporter vectors include multiple cloning sites positioned upstream of a 
reporter gene encoding a readily assayable protein such as secreted alkaline phosphatase, 0- 
galactosidase, or green fluorescent protein. The sequences upstream of the EST-related nucleic 
acids, positional segments of EST-related nucleic acids or fragments of positional segments of EST- 
related nucleic acids are inserted into the cloning sites upstream of the reporter gene in both 
orientations and introduced into an appropriate host cell The level of reporter protein is assayed and 
compared to the level obtained from a vector which lacks an insert in the cloning site. The presence 
of an elevated expression level in the vector containing the insert with respect to the control vector 
indicates the presence of a promoter in the insert. If necessary, the upstream sequences can be cloned 
into vectors which contain an enhancer for augmenting transcription levels from weak promoter 
sequences. A significant level of expression above that observed with the vector lacking an insert 
indicates that a promoter sequence is present in the inserted upstream sequence. 

Appropriate host cells for the promoter reporter vectors may be chosen based on the results 
of the above described determination of expression patterns of the EST-related nucleic acids, 
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positional segments of EST-related nucleic acids or fragments of positional segments of EST-related 
nucleic acids. For example, if the expression pattern analysis indicates that the mRNA 
corresponding to a particular EST-related nucleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-related nucleic acids is expressed in fibroblasts, the 
promoter reporter vector may be introduced into a human fibroblast cell line. 

Promoter sequences within the upstream genomic DNA may be further defined by 
constructing nested deletions in the upstream DNA using conventional techniques such as 
Exonuclease HI digestion. The resulting deletion fragments can be inserted into the promoter 
reporter vector to determine whether the deletion has reduced or obliterated promoter activity. In this 
way, the boundaries of the promoters may be defined. If desired, potential individual regulatory sites 
within the promoter may be identified using site directed mutagenesis or linker scanning to obliterate 
potential transcription factor binding sites within the promoter individually or in combination. The 
effects of these mutations on transcription levels may be determined by inserting the mutations into 
the cloning sites in the promoter reporter vectors. 

EXAMPLE 55 

Cloning and Identification of Promoters 
Using the method described in Example 54 above with 5' ESTs, sequences upstream of 
several genes were obtained. Using the primer pairs GGG AAG ATG GAG ATA GTA TTG CCT 
G (SEQ ID NO: 15) and CTG CCA TGT ACA TGA TAG AGA GAT TC (SEQ ID NO: 16), the 
promoter having the internal designation P13H2 (SEQ ID NO: 17) was obtained. 

Using the primer pairs GTA CCA GGG ACT GTG ACC ATT GC (SEQ ID NO: 18) and 
CTG TGA CCA TTG CTC CCA AGA GAG (SEQ ID NO: 19), the promoter having the internal 
designation P15B4 (SEQ ID NO:20) was obtained. 

Using the primer pairs CTG GGA TGG AAG GCA CGG TA (SEQ ID NO:21) and GAG 
ACC ACA CAG CTA GAC AA (SEQ ID NO:22), the promoter having the internal designation 
P29B6 (SEQ ID NO:23) was obtained. 

Figure 4 provides a schematic description of the promoters isolated and the way they are 
assembled with the corresponding 5' tags. The upstream sequences were screened for the presence 
of motifs resembling transcription factor binding sites or known transcription start sites using the 
computer program Matlnspector release 2.0, August 1996. 

The transcription factor binding sites present in each of these promoters are listed as protein 
binding sites in the sequence listing for SEQ ID NOs: 17, 20, and 23 
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Alternatively, oligonucleotides which are complementary to the strand normally transcribed 
in the cell may be synthesized in vitro. Thus, the antisense nucleic acids are complementary to the 
corresponding mRNA and are capable of hybridizing to the mRNA to create a duplex. In some 
embodiments, the antisense sequences may contain modified sugar phosphate backbones to increase 
stability and make them less sensitive to RNase activity. Examples of modifications suitable for use 
in antisense strategies are described by Rossi et al., Pharmacol Tlier. 50(2):245-254, (1991), the 
entire disclosure of which is incorporated herein by reference. 

Various types of antisense oligonucleotides complementary to the sequence of the EST- 
related nucleic acids, positional segments of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids may be used. In one preferred embodiment, stable and semi- 
stable antisense oligonucleotides described in International Application No. PCT WO94/23026, the 
entire disclosure of which is incorporated herein by reference, are used. In these molecules, the 3' 
end or both the 3 r and 5 ? ends are engaged in intramolecular hydrogen bonding between 
complementary base pairs. These molecules are better able to withstand exonuclease attacks and 
exhibit increased stability compared to conventional antisense oligonucleotides. 

In another preferred embodiment, the antisense oligodeoxynucleotides against herpes 
simplex virus types 1 and 2 described in International Application No. WO 95/04141, the entire 
disclosure of which is incorporated herein by reference, are used. 

In yet another preferred embodiment, the covalently cross-linked antisense oligonucleotides 
described in International Application No. WO 96/31523, the entire disclosure of which is 
incorporated herein by reference, are used. These double- or single-stranded oligonucleotides 
comprise one or more, respectively, inter- or intra-oligonucleotide covalent cross-linkages, wherein 
the linkage consists of an amide bond between a primary amine group of one strand and a carboxyl 
group of the other strand or of the same strand, respectively, the primary amine group being directly 
substituted in the 2' position of the strand nucleotide monosaccharide ring, and the carboxyl group 
being carried by an aliphatic spacer group substituted on a nucleotide or nucleotide analog of the 
other strand or the same strand, respectively. 

The antisense oligodeoxynucleotides and oligonucleotides disclosed in International 
Application No. WO 92/18522, the entire disclosure of which is incorporated herein by reference, 
may also be used. These molecules are stable to degradation and contain at least one transcription 
control recognition sequence which binds to control proteins and are effective as decoys therefor. 
These molecules may contain "hairpin" structures, "dumbbell" structures, "modified dumbbell" 
structures, "cross-linked" decoy structures and "loop" structures. 
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In some embodiments in which the protein or polypeptide is secreted, nucleic acids encoding 
the full length protein (i.e. the signal peptide and the mature protein), or nucleic acids encoding only 
the mature protein (i.e. the protein generated when the signal peptide is cleaved off) is introduced 
into the host organism. 

The nucleic acids encoding the proteins or polypeptides may be introduced into the host 
organism using a variety of techniques known to those of skill in the art. For example, the extended 
cDNA may be injected into the host organism as naked DNA such that the encoded protein is 
expressed in the host organism, thereby producing a beneficial effect. 

Alternatively, the nucleic acids encoding the protein or polypeptide may be cloned into an 
expression vector downstream of a promoter which is active in the host organism. The expression 
vector may be any of the expression vectors designed for use in gene therapy, including viral or 
retroviral vectors. The expression vector may be directly introduced into the host organism such that 
the encoded protein is expressed in the host organism to produce a beneficial effect. In another 
approach, the expression vector may be introduced into cells in vitro. Cells containing the 
expression vector are thereafter selected and introduced into the host organism, where they express 
the encoded protein or polypeptide to produce a beneficial effect. 

EXAMPLE 60 

Use of Sienal Peptides To Import Proteins Into Cells 
The short core hydrophobic region (h) of signal peptides encoded by the sequences of SEQ 

ID NOs. 24-728 and 766-792 may also be used as a carrier to import a peptide or a protein of 

interest, so-called cargo, into tissue culture cells (Lin et aL, J. Biol. Chem. t 270: 14225-14258 (1995); 

Du et aL, J. Peptide Res., 51: 235-243 (1998); and Rojas et aL, Nature Biotech., 16: 370-375 (1998), 

the entire disclosures of which are incorporated herein by reference). 

When cell permeable peptides of limited size (approximately up to 25 amino acids) are to be 
translocated across cell membrane, chemical synthesis may be used in order to add the h region to 
either the C-terminus or the N-terminus to the cargo peptide of interest. Alternatively, when longer 
peptides or proteins are to be imported into cells, nucleic acids can be genetically engineered, using 
techniques familiar to those skilled in the art, in order to link the extended cDNA sequence encoding 
the h region to the 5' or the 3' end of a DNA sequence coding for a cargo polypeptide. Such 
genetically engineered nucleic acids are then translated either in vitro or in vivo after transfection into 
appropriate cells, using conventional techniques to produce the resulting cell permeable polypeptide. 
Suitable hosts cells are then simply incubated with the cell permeable polypeptide which is then 
translocated across the membrane. 
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This method may be applied to study diverse intracellular functions and cellular processes. 
For instance, it has been used to probe functionally relevant domains of intracellular proteins and to 
examine protein-protein interactions involved in signal transduction pathways (Lin et al., supra; Lin 
et al., J. Biol. Chem., 271: 5305-5308 (1996); Rojas et al., J. Biol. Chem., Ill: 27456-27461 (1996); 
Liu et al., Proc. Natl. Acad. Sci. USA, 93: 11819-11824 (1996); Rojas et al., Bioch. Biophys. Res. 
Commun., 234: 675-680 (1997), the entire disclosure of which is incorporated herein by reference). 

Such techniques may be used in cellular therapy to import proteins producing therapeutic 
effects. For instance, cells isolated from a patient may be treated with imported therapeutic proteins 
and then re-introduced into the host organism. 

Alternatively, the h region of signal peptides of the present invention could be used in 
combination with a nuclear localization signal to deliver nucleic acids into cell nucleus. Such 
oligonucleotides may be antisense oligonucleotides or oligonucleotides designed to form triple 
helixes, as described above, in order to inhibit processing and maturation of a target cellular RNA. 
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EXAMPLE 61 

Computer Embodiments 
As used herein the term "nucleic acid codes of SEQ ID NOs. 24-811 and 1600-1622 
encompasses the nucleotide sequences of SEQ ID NOs. 24-81 1 and 1600-1622, fragments of SEQ 
ID NOs. 24-81 1 and 1600-1622, nucleotide sequences homologous to SEQ ID NOs. 24-81 1 and 
1600-1622 or homologous to fragments of SEQ ID NOs. 24-811 and 1600-1622, and sequences 
complementary to all of the preceding sequences. The fragments include portions of SEQ ID NOs. 
24-811 and 1600-1622 comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 
400, or 500 consecutive nucleotides of SEQ ID NOs. 24-811 and 1600-1622. Preferably, the 
fragments are novel fragments. Preferably the fragments include polynucleotides described in Table 
II, polynucleotides described in Table III, polynucleotides described in Table P/ or portions thereof 
comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive 
nucleotides of the polynucleotides described in Tables II, III, or IV. Homologous sequences and 
fragments of SEQ ID NOs. 24-81 1 and 1600-1622 refer to a sequence having at least 99%, 98%, 
97%, 96%, 95%, 90%, 85%, 80%, or 75% homology to these sequences. Homology may be 
determined using any of the computer programs and parameters described in Example 18, including 
BLAST2N with the default parameters or with any modified parameters. Homologous sequences 
also include RNA sequences in which undines replace the thymines in the nucleic acid codes of SEQ 
ID NOs. 24-811 and 1600-1622. The homologous sequences may be obtained using any of the 
procedures described herein or may result from the correction of a sequencing error as described 
above. Preferably the homologous sequences and fragments of SEQ ID NOs. 24-811 and 1600- 
1622 include polynucleotides described in Table II, polynucleotides described in Table III, 
polynucleotides described in Table IV or portions thereof comprising at least 10, 15, 20, 25, 30, 35, 
40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive nucleotides of the polynucleotides described 
in Tables II, HI, or IV. It will be appreciated that the nucleic acid codes of SEQ ID NOs. 24-81 1 
and 1600-1622 can be represented in the traditional single character format (See the inside back 
cover of Stiyer, Lubert. Biochemistry, 3 rd edition. W. H Freeman & Co., New York.) or in any other 
format which records the identity of the nucleotides in a sequence. 

As used herein the term "polypeptide codes of SEQ ID NOS. 812-1599" encompasses the 
polypeptide sequence of SEQ ID NOs. 812-1599 which are encoded by the 5' EST s of SEQ ID 
NOs. 24-811 and 1600-1622, polypeptide sequences homologous to the polypeptides of SEQ ID 
NOS. 812-1599, or fragments of any of the preceding sequences. Homologous polypeptide 
sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 
80%, 75% homology to one of the polypeptide sequences of SEQ ID NOS. 812-1599. Homology 

S: v SH-RESP\GEN\T121Cl\Spec (clean-nov200Drtf.doc 



C r 



!66 Docket No.: GEN-T121C1 

Serial No. 09/471,276 

embodiment the computer system is a Sun Enterprise 1000 server (Sun Microsystems, Palo Alto, 
CA). The computer system preferably includes a processor for processing, accessing and 
manipulating the sequence data. The processor can be any well-known type of central processing 
unit, such as the Pentium in from Intel Corporation, or similar processor from Sun, Motorola, 
Compaq or International Business Machines. 

Preferably, the computer system is a general purpose system that comprises the processor 
and one or more internal data storage components for storing data, and one or more data retrieving 
devices for retrieving the data stored on the data storage components. A skilled artisan can readily 
appreciate that any one of the currently available computer systems are suitable. 

In one particular embodiment, the computer system includes a processor connected to a bus 
which is connected to a main memory (preferably implemented as RAM) and one or more internal 
data storage devices, such as a hard drive and/or other computer readable media having data recorded 
thereon. In some embodiments, the computer system further includes one or more data retrieving 
device for reading the data stored on the internal data storage devices. 

The data retrieving device may represent, for example, a floppy disk drive, a compact disk 
drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device is a 
removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. 
containing control logic and/or data recorded thereon. The computer system may advantageously 
include or be programmed by appropriate software for reading the control logic and/or the data from 
the data storage component once inserted in the data retrieving device. 

The computer system includes a display which is used to display output to a computer user. 
It should also be noted that the computer system can be linked to other computer systems in a 
network or wide area network to provide centralized access to the computer system. 

Software for accessing and processing the nucleotide sequences of the nucleic acid codes of 
SEQ ID NOs. 24-81 1 and 1600-1622, or the amino acid sequences of the polypeptide codes of SEQ 
ED NOS. 8 12-1 599 (such as search tools, compare tools, and modeling tools etc.) may reside in main 
memory during execution. 

In some embodiments, the computer system may further comprise a sequence comparer for 
comparing the above-described nucleic acid codes of SEQ ID NOs. 24-811 and 1600-1622 or 
polypeptide codes of SEQ ID NOS. 812-1599 stored on a computer readable medium to reference 
nucleotide or polypeptide sequences stored on a computer readable medium. A "sequence 
comparer" refers to one or more programs which are implemented on the computer system to 
compare a nucleotide or polypeptide sequence with other nucleotide or polypeptide sequences and/or 
compounds including but not limited to peptides, peptidomimetics, and chemicals stored within the 

S:\SH-RESP\GEN\T121Cl\Spec (c lean-no v2001)rtf.doc 



169 Docket No.: GEN-T121C1 

Serial No. 09/471,276 

One embodiment is a process in a computer for determining whether two sequences are 
homologous. The process begins at a start state and then moves to a state wherein a first sequence 
to be compared is stored to a memory. The second sequence to be compared is then stored to a 
memory at a state. The process then moves to a state wherein the first character in the first 
sequence is read and then to a state wherein the first character of the second sequence is read. It 
should be understood that if the sequence is a nucleotide sequence, then the character would 
normally be either A, T. C, G or U. If the sequence is a protein sequence, then it should be ,n the 
single letter amino acid code so that the first and sequence sequences can be easily compared. 

A determination is then made at a decision state whether the two characters are the same. 
If they are the same, then the process moves to a state wherein the next characters in the first and 
second sequences are read. A determination is then made whether the next characters are the 
same. If they are, then the process continues this loop until two characters are not the same. If a 
determination is made that the next two characters are not the same, the process moves to a 
decision state to determine whether there are any more characters either sequence to read. 

If there are no more more characters to read, then the process moves to a state wherein 
the level of homology between the first and second sequences is displayed to the user. The level 
of homology is determined by calculating the proportion of characters between the sequences that 
were the same out of the total number of sequences in the first sequence. Thus, if every character 
in a first 100 nucleotide sequence aligned with a every character in a second sequence, the 
homology level would be 100%. 

Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of SEQ ID NOs. 24-81 1 and 1600- 
1622 differs from a reference nucleic acid sequence at one or more positions. Optionally such a 
program records the length and identity of inserted, deleted or substituted nucleotides with respect to 
the sequence of either the reference polynucleotide or the nucleic acid code of SEQ ID NOs. 24-81 1 
and 1600-1622. In one embodiment, the computer program may be a program which determines 
whether the nucleotide sequences of the nucleic acid codes of SEQ ID NOs. 24-81 1 and 1600-1622 
contain a biallelic marker or single nucleotide polymorphism (SNP) with respect to a reference 
nucleotide sequence. This single nucleotide polymorphism may comprise a single base substitution, 
insertion, or deletion, while this biallelic marker may comprise about one to ten consecutive bases 
substituted, inserted or deleted. 
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The nucleic acid codes of SEQ ID NOs. 24-81 1 and 1600-1622 or the polypeptide codes 
of SEQ ID NOS. 812-1599 may be stored and manipulated in a variety of data processor programs 
in a variety of formats. For example, the nucleic acid codes of SEQ ID NOs. 24-81 1 and 1600- 
1622 or the polypeptide codes of SEQ ID NOS. 812-1599 may be stored as text in a word 
processing file, such as MicrosoftWORD or WORDPERFECT or as an ASCII file in a variety of 
database programs familiar to those of skill in the art, such as DB2, SYBASE, or ORACLE. In 
addition, many computer programs and databases may be used as sequence comparers, identifiers, 
or sources of reference nucleotide or polypeptide sequences to be compared to the nucleic acid 
codes of SEQ ID NOs. 24-81 1 and 1600-1622 or the polypeptide codes of SEQ ED NOS. 812- 
1599. The following list is intended not to limit the invention but to provide guidance to programs 
and databases which are useful with the nucleic acid codes of SEQ ID NOs. 24-81 1 and 1600-1622 
or the polypeptide codes of SEQ ID NOS. 812-1599. The programs and databases which may be 
used include, but are not limited to: MacPattern (EMBL), DiscoveryBase (Molecular Applications 
Group), GeneMine (Molecular Applications Group), Look (Molecular Applications Group), 
MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX 
(Altschul et al, J. Mol. Biol. 215: 403 (1990)), FASTA (Pearson and Lipman, Proc. Natl. Acad. Sci. 
USA. 85: 2444 (1988)). FASTDB (Brutlag et al. Comp. App. Biosci. 6:237-245, 1990), Catalyst 
(Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.), Cerius : .DBAccess 
(Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.). Insight II, (Molecular 
Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), 
Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), QuanteMM. (Molecular 
Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), 
ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc.), WebLab 
(Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular Simulations Inc.), Gene 
Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), the 
EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug 
Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug 
Index database, the BioByteMasterFile database, the Genbank database, and the Genseqn database. 
Many other programs and databases would be apparent to one of skill in the art given the present 
disclosure. 

Motifs which may be detected using the above programs include sequences encoding 
leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, 
and beta sheets, signal sequences encoding signal peptides which direct the secretion of the 
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encoded proteins, sequences implicated in transcription regulation such as homeoboxes, acidic 
stretches, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 



EXAMPLE 62 

Methods of Making Nucleic Acids 
The present invention also comprises methods of making the EST-related nucleic acids, 
fragments of EST-related nucleic acids, positional segments of the EST-related nucleic acids, or 
fragments of positional segments of the EST-related nucleic acids. The methods comprise 
sequentially linking together nucleotides to produce the nucleic acids having the preceding 
sequences. A variety of methods of synthesizing nucleic acids are known to those skilled in the 
an. 

In many of these methods, synthesis is conducted on a solid support. These included the 
y phosphoramidite methods in which the 3 ? terminal base of the desired oligonucleotide is 
immobilized on an insoluble carrier. The nucleotide base to be added is blocked at the 5' 
hydroxy! and activated at the 3 ? hydroxy! so as to cause coupling with the immobilized nucleotide 
base. Deblocking of the new immobilized nucleotide compound and repetition of the cycle will 
produce the desired polynucleotide. Alternatively, polynucleotides may be prepared as described 
in U.S. Patent No. 5,049,656. In some embodiments, several polynucleotides prepared as 
described above are ligated together to generate longer polynucleotides having a desired 
sequence. 



EXAMPLE 63 

Methods of Making Polypeptides 
The present invention also comprises methods of making the polynucleotides encoded by 
EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of the 
EST-related nucleic acids, or fragments of positional segments of the EST-related nucleic acids 
and methods of making the EST-related polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides, or fragments of EST-related polypeptides. The 
methods comprise sequentially linking together amino acids to produce the nucleic polypeptides 
having the preceding sequences. In some embodiments, the polypeptides made by these methods 
are 150 amino acid or less in length. In other embodiments, the polypeptides made by these 
methods are 120 amino acids or less in length. 
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INVENTORS Jean-Baptiste Dumas Milne Edwards, Aymeric Duclert, 

Jean-Yves Giordano 



It is certified that errors appear in the above-identified patent and that said Letters Patent is hereby 
corrected as shown below: 

Column 2: 

Line 28, "268:7314," should read -268:731-4,--. 
Column 4: 

Line 1, "6:236244," should read -6:236-244,--. 
Line 3, "Spel" should read — Spel— . 
Line 4, "Spel" should read -Spel-. 

Column 5: 

Line 19, "term" should read —term-. 
Column 7: 

Line 61, "extended,cDNAs" should read -extended cDNAs— . 
Column 8: 

Line 63, "SEQ D NOs." should read -SEQ ID NOs.-. 
Column 12: 

Line 35, "15541580" should read -1554-1580-. 
Column 20: 

Line 5, "gene D" should read —gene II—. 

Line 21, "polyerase" should read —polymerase-. 
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It is certified that errors appear in the above-identified patent and that said Letters Patent is hereby 
corrected as shown below: 

Column 21: 

Line 38, "fill-length" should read -full-length--. 
Column 25: 

Line 8, "Ednian" should read — Edman— . 

Line 9, "N-terrrinar should read -N-terminal-. 

Column 27: 

Line 41, "24728" should read -24-728-. 
Column 30: 

Line 11, "nucleobde(s)" should read — nucleotide(s)--. 
Column 3 1 : 

SEQ ID NO: 415, "53" should read -S3-. 
Column 33: 

SEQ ID NO: 800, "km" should read -krn-. 
Column 34: 

Line 50, "polynucleotide" should read -polynucleotide—. 
Column 45: 

Line 34, "5'ESTs Consensus" should read -5'ESTs, Consensus-. 
Line 53, "the type" should read —the tissue type-. 
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It is certified that errors appear in the above-identified patent and that said Letters Patent is hereby 
corrected as shown below: 

Column 54: 

Lines 12-14, "647 F:l should read -647 F:l 
643 F:l 648 F:l 

649 F:l" 649 F:l-. 

Column 59: 

Line 38, "by., a" should read —by a—. 
Column 60: 

Line 41, "it'may" should read —it may—. 
Column 61: 

Line 53, "bioinfomnatics" should read — bioinformatics— . 
Column 67: 

Line 31, "403410" should read -403-410--. 
Column 68: 

Line 1, "17:4941" should read -17:49-61--. 
Column 69: 

Line 61, "10%-20%=80%" should read -100%-20%=80%-. 
Column 70: 

Line 63, "Gene H" should read —Gene II-. 
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It is certified that errors appear in the above-identified patent and that said Letters Patent is hereby 
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Column 72: 



Line 29, "10:685686" should read -10:685-686--. 
Column 75: 

Line 6, "pXl contains" should read —pXTl contains--. 
Line 16, "Bglll" should read -BG1II-. 

Column 77: 

Line 59, "Cytokine Cell" should read -Cytokine, Cell--. 
Column 78: 

Line 12, "145:17061712" should read -145:1706-1712-. 
Column 79: 

Line 36, "Takcai" should read — Takai— . 
Line 44, "134:536544" should read -134:536-544-. 
Line 47, "67:40624069" should read -67:4062-4069-. 
Line 57, "Darzynliewicz" should read — Darzynkiewicz— . 
Line 59, "7:659670" should read -7:659-670--. 

Column 81: 

Line 2, "tsplants" should read —transplants—. 



MAILING ADDRESS OF SENDER: 

Saliwanchik, Lloyd & Saliwanchik 
P.O. Box 142950 
Gainesville, FL 32614-2950 



PATENT NO. 6,822,072 

No. of additional copies 



UNITED STATES PATENT AND TRADEMARK OFFICE 
CERTIFICATE OF CORRECTION 

PATENT NO. 6,822,072 Page 5 of 7 

DATED November 23, 2004 

INVENTORS Jean-Baptiste Dumas Milne Edwards, Aymeric Duclert, 

Jean-Yves Giordano 



It is certified that errors appear in the above-identified patent and that said Letters Patent is hereby 
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Column 85: 

Line 11, "tendonaigament-like" should read --tendon/ligament-like-- . 
Line 20, "in viva" should read —in vivo—. 

Column 87: 

Line 55, "45:413419" should read --45:413-419--. 
Line 57, "35:467474" should read -35:467-474-. 

Column 89: 

Line 50, "Inclining" should read —killing-. 
Line 63, "behaviora" should read —behavioral—. 

Column 91: 

Line 48, "Chromalographia" should read —Chromatogr aphid—. 
Column 95: 

Line 40, "GIC" should read -G/C-. 

Line 62, "Acids Positional" should read —Acids, Positional—. 
Column 97: 

Line 49, "probes arc" should read -probes are-. 



MAILING ADDRESS OF SENDER: PATENT NO. 6,822,072 

Saliwanchik, Lloyd & Saliwanchik No. of additional copies 

P.O.Box 142950 t=^> 
Gainesville, FL 32614-2950 



UNITED STATES PATENT AND TRADEMARK OFFICE 
CERTIFICATE OF CORRECTION 

PATENT NO. 6,822,072 Page 6 of 7 

DATED November 23, 2004 

INVENTORS Jean-Baptiste Dumas Milne Edwards, Aymeric Duclert, 

Jean- Yves Giordano 

It is certified that errors appear in the above-identified patent and that said Letters Patent is hereby 
corrected as shown below: 

Column 98: 

Line 11, "P2" should read ~P 32 -. 
Line 18, " 32 p" should read ~ 32 P-. 
Line 67, "p 32 " should read -P 32 ~. 

Column 99: 

Line 64, "(FK)" should read -(TK)~. 
Column 100: 

Line 9, "Acids Positional" should read —Acids, Positional—. 
Column 101: 

Line 40, "Collected" should read — Colcemid-. 
Column 112: 

Line 23, "in, both" should read -in both--. 
Column 113: 

Line 11, "CIA" should read -CTA-. 

Line 19, "release 20" should read —release 2.0—. 

Column 115: 

Line 66, "antiseese" should read — antisense— . 
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Column 119: 

Line 30, "region (b)" should read —region (h)~. 

Line 61, "271:2745627461" should read --271:27456-27461--. 

Line 63, "234:675680" should read -234:675-680--. 

Column 120: 

Line 56, "3 4 edition" should read -3 rd edition-. 
Lines 66-67, "990%, 98%, 97%, 96% 95%, 900%, 85%" should read 
-99%, 98%, 97%, 96%, 95%, 90%, 85%-. 

Column 122: 

Line 1 8, "Continuing" should read —containing--. 
Column 124: 

Lines 60-61, "polynucleotides" should read —polynucleotide—. 
Column 127: 

Lines 39-40, "MicrosofRWORD" should read -MicrosoftWORD-. 
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Line 49, "desired polynucleotides." should read —desired polynucleotide. 
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